SQL Percercentile Calculation - sql

I have the following query, which even without a ton of data (~3k rows) is still a bit slow to execute, and the logic is a bit over my head - was hoping to get some help optimizing the query or even an alternate methodology:
Select companypartnumber, (PartTotal + IsNull(Cum_Lower_Ranks, 0) ) / Sum(PartTotal) over() * 100 as Cum_PC_Of_Total
FROM PartSalesRankings PSRMain
Left join
(
Select PSRTop.Item_Rank, Sum(PSRBelow.PartTotal) as Cum_Lower_Ranks
from partSalesRankings PSRTop
Left join PartSalesRankings PSRBelow on PSRBelow.Item_Rank < PSRTop.Item_Rank
Group by PSRTop.Item_Rank
) as PSRLowerCums on PSRLowerCums.Item_Rank = PSRMain.Item_Rank
The PartSalesRankings table simply consists of CompanyPartNumber(bigint) which is a part number designation, PartTotal(decimal 38,5) which is the total sales, and Item_Rank(bigint) which is the rank of the item based on total sales.
I'm trying to end up with my parts into categories based on their percentile - so an "A" item would be top 5%, a "B" item would be the next 15%, and "C" items would be the lower 80th percentile. The view I created works fine, it just takes almost three seconds to execute, which for my purposes is quite slow. I narrowed the bottle neck to the above query - any help would be greatly appreciated.

The problem you are having is the calculation of the cumulative sum of PartTotal. If you are using SQL Server 2012, you can do something like:
select (case when ratio <= 0.05 then 'A'
when ratio <= 0.20 then 'B'
else 'C'
end),
t.*
from (select psr.companypartnumber,
(sum(PartTotal) over (order by PartTotal) * 1.0 / sum(PartTotal) over ()) as ratio
FROM PartSalesRankings psr
) t
SQL Server 2012 also have percentile functions and other functions not in earlier versions.
In earlier versions, the question is how to get the cumulative sum efficiently. Your query is probably as good as anything that can be done in one query. Can the cumulative sum be calculated when partSalesRankings is created? Can you use temporary tables?

Related

counts' division doesn't work in full code

I do have a problem with a task because my division value is different when I use it alone and when I use it in full code. Let's say I do this code:
SELECT (count(paimta))::numeric / count(distinct paimta) as average
FROM Stud.Egzempliorius;
and finally a number I get is 2.(6)7, but when I use it in full code which is:
SELECT Stud.Egzempliorius.Paimta, COUNT(PAIMTA) as PaimtaKnyga
FROM Stud.Skaitytojas, Stud.Egzempliorius
WHERE Stud.Skaitytojas.Nr=Stud.Egzempliorius.Skaitytojas
GROUP BY Stud.Egzempliorius.Paimta
HAVING count(paimta) > (count(paimta))::numeric / count(distinct paimta);
it's value changes because division is not working anymore and let's say instead of having
HAVING count(paimta) > (count(paimta))::numeric / count(distinct paimta);
my codes turns into
HAVING count(paimta) > (count(paimta))::numeric;
and these values are equal, so I can't get final answer. That's database I use https://klevas.mif.vu.lt/~baronas/dbvs/biblio/show-table.php?table=Stud.Egzempliorius
I was struggling for 10 hours now and finally I've lost my patience... So, my question is what I have to do that this code:
SELECT (count(paimta))::numeric / count(distinct paimta) as average
FROM Stud.Egzempliorius;
value doesn't change in full code?
Picture how it changes Photo
Your solution fails because the two queries operate on a different groups of rows. The first query does a computation over the whole dataset, while the second one groups by paimta.
One option would have been to use window functions, but as far as concerns Postgres does not support count(distinct) as a window function.
I think that the simplest approach is to use a subquery :
select e.paimta, count(paimta) as paimtaknyga
from stud.skaitytojas s
inner join stud.egzempliorius e on s.nr = e.skaitytojas
group by e.paimta
having count(paimta) > (
select (count(paimta))::numeric / count(distinct paimta) from stud.egzempliorius
)

SQL - Counting and Displaying as Columns the Week data

The website I am working on has a SQL report that has been requested to be used as a basis for a new report. The old report takes the usage records of items within the date range and uses it to calculate the total weight with the items being split into their respective item groups so the group's respective total weight usage can be shown as well.
I have cut down the code to its basic form to detail how it currently works
with td as(
select
g.description as "ItemGroup"
,u.itemcode as "ItemCode"
,i.description as "ItemDescription"
,sum(cast(i.weight as real) * usage) as "Weight"
from
usage u
inner join
items i
on i.code = u.itemcode
inner join
groups g
on g.code = i.groupcode
where
date(updatedate) > date(#FROM) and date(updatedate) <= date(#TO)
group by g.description,u.itemcode,i.description)
select 'Detail' as "type",td.* from td
union
select 'Group Total' as "type",itemgroup,null,sum(weight) from td group by itemgroup
union
select 'Total' as "type",'' as itemgroup,null,sum(weight) from td
order by type, itemcode
However the new report they request expands the report to show the individual Week usage totals as columns as well (Week 1,2,...43). Now the problem I see with this is that the SQL would have to know how many weeks are within the Date Range to create columns and then use a where clause equivalent to find the values within that date range for each week. And I don't know if that is even possible with SQL.
I had a look on google and could not find anything related on the matter. I would like to be proven wrong as that would be a powerful tool to have in SQL. Can someone tell me if this is possible?
I don't believe this is possible without writing dynamic SQL. Unless you can rely on there being an upper limit. If you can prepare enough columns to cover your upper limit, you can do something like:
SELECT
g.description,
u.itemcode,
i.description,
SUM(CASE
WHEN (CAST(julianday(date(updatedate)) AS INTEGER) - CAST(julianday(date(#FROM)) AS INTEGER)) / 7 BETWEEN 0 AND 0.9 THEN CAST(i.weight AS REAL) * usage
ELSE 0.0
END) AS Week1Weight,
SUM(CASE
WHEN (CAST(julianday(date(updatedate)) AS INTEGER) - CAST(julianday(date(#FROM)) AS INTEGER)) / 7 BETWEEN 1 AND 1.9 THEN CAST(i.weight AS REAL) * usage
ELSE 0.0
END) AS Week2Weight,
SUM(CASE
WHEN (CAST(julianday(date(updatedate)) AS INTEGER) - CAST(julianday(date(#FROM)) AS INTEGER)) / 7 BETWEEN 2 AND 2.9 THEN CAST(i.weight AS REAL) * usage
ELSE 0.0
END) AS Week3Weight,
And so on...
I'm not familiar with SQLite so I may have gotten some syntax wrong.
you can use strftime function to get the week number from the date column like:
strftime('%W',date(updatedate)) > #FROM
AND
strftime('%W',date(updatedate)) <= #TO

SQL Percentage of Occurrences

I'm working on some SQL code as part of my University work. The data is factitious just to be clear. I'm trying to count the occurances of 1 & 0 in the SQL table Fact_Stream, this is stored in the Free_Stream column/attribute as a Boolean/bit value.
As calculations cant be made on bit values (at least in the way I'm trying) I've converted the value to an integer -- Just to be clear on that. The table contains information on a streaming companies streams, a 1 indicates the stream was free of charge, a 0 indicates the stream was paid for. My code:
SELECT Fact_Stream.Free_Stream, ((CAST(Free_Stream AS INT)) / COUNT(*) * 100) As 'Percentage of Streams'
FROM Fact_Stream
GROUP BY Free_Stream
The result/output is nearly where I want it to be, but it doesn't display the percentage correctly.
Output:
Using MS SQL Management Studio | MS SQL Server 2012 (I believe)
The percentage should be based on all rows, so you need to divide the count per 1/0 by a count of all rows. The easiest way to get this is utilizing a Windowed Aggregate Function:
SELECT Fact_Stream.Free_Stream,
100.0 * COUNT(*) -- count per bit
/ SUM(COUNT(*)) OVER () -- sum of those counts = count of all rows
As "Percentage of Streams"
FROM Fact_Stream
GROUP BY Free_Stream
You have INTs as a devisor and devidened(not sure I am correct with namings). So the result is also INT. Just cast one of those to decimal(notice how did I change to 100.0). Also you should debide count of elements in group to total count of rows in the table:
select Free_Stream,
(count(*) / (select count(*) from Free_Stream)) * 100.0 as 'Percentage of Streams'
from Fact_Stream
group by Free_Stream
Your equation is dividing the identifier (1 or 0) by the number of streams for each one, instead of dividing the count of free or paid by the total count. One way to do this is to get the total count first, then use it in your query:
declare #totalcount real;
select #totalcount = count(*) from Fact_Stream;
SELECT Fact_Stream.Free_Stream,
(Cast(Count(*) as real) / #totalcount)*100 AS 'Percentage of Streams'
FROM Fact_Stream
group by Fact_Stream.Free_Stream

Total Count in Grouped TSQL Query

I have an performance heavy query, that filters out many unwanted records based on data in other tables etc.
I am averaging a column, and also returning the count for each average group. This is all working fine.
However, I would also like to include the percentage of the TOTAL count.
Is there any way of getting this total count without rerunning the whole query, or increasing the performance load significantly?
I would also prefer if I didn't need to completely restructure the sub query (e.g. by getting the total count outside of it), but can do if necessary.
SELECT
data.EquipmentId,
AVG(MeasureValue) AS AverageValue,
COUNT(data.*) AS BinCount
COUNT(data.*)/ ???TotalCount??? AS BinCountPercentage
FROM
(SELECT * FROM MultipleTablesWithJoins) data
GROUP BY data.EquipmentId
See Window functions.
SELECT
data.EquipmentId,
AVG(MeasureValue) AS AverageValue,
COUNT(*) AS BinCount,
COUNT(*)/ cast (cnt as float) AS BinCountPercentage
FROM
(SELECT *,
-- Here is total count of records
count(*) over() cnt
FROM MultipleTablesWithJoins) data
GROUP BY data.EquipmentId, cnt
EDIT: forgot to actually divide the numbers.
Another approach:
with data as
(
SELECT * FROM MultipleTablesWithJoins
)
,grand as
(
select count(*) as cnt from data
)
SELECT
data.EquipmentId,
AVG(MeasureValue) AS AverageValue,
COUNT(data.*) AS BinCount
COUNT(data.*)/ grand.cnt AS BinCountPercentage
FROM data cross join grand
GROUP BY data.EquipmentId

Simplify query in H2 database - alternative to TOP X PERCENT

I'm having performance issues with a query and was wondering how to simplify it.
I have a table "Evaluations" (Sample, Category, Jury, Value)
And created some custom functions to get some average values for each sample, so I have this view:
CREATE VIEW Results AS
SELECT Sample,
Category,
IFNULL(COUNT_VALID(Value),0) || ' / ' || COUNT(Value) AS Valid,
CUSTOM_MEAN(Value) AS Mean,
CUSTOM_MEDIAN(Value) AS Median
FROM Evaluations GROUP BY Sample, Category;
Then I want to have another field telling me if each sample is within the 30% of best valued samples of its category. It would be perfect to use TOP(X) PERCENT but it seems H2 doesn't support it so I made a second view that calculates the position in category multiplied by 100, divided by the total count in category and compared to 30:
CREATE VIEW Res AS
SELECT R1.*,
CASE
WHEN (
((SELECT COUNT(*) FROM Results R2
WHERE R2.Category = R1.Category
AND (R2.Mean > R1.Mean OR (R2.Mean = R1.Mean AND R2.Median > R1.Median))) + 1) * 100
/
(SELECT COUNT(*) FROM Results R2 WHERE R2.Category = R1.Category) )
> 30
THEN 'over 30%'
ELSE 'within 30%'
END as 30PERCENT
FROM Results R1 ORDER BY Mean DESC, Median DESC;
This works properly but with just 500 records it takes some time to retrieve the results.
Could someone tell me a more efficient way of constructing this query?
Thanks and regards!