Calculating percentages with SQL - sql

I'm trying to create an SQL query to work out the percentage of rows given its number of play counts.
My DB currently has 800 rows of content,
All content has been played a total of 3,000,000 times put together
table:
id, play_count, content
Lets say I'd like to work out the percentage of the first 10 rows.
My attempts have looked similar to this:
SELECT COUNT(*) AS total_content,
SUM(play_count) AS total_played,
content.play_count AS content_plays
FROM bebo_video
How would I put this all together to show a final percentage on each individual row??

SELECT play_count / (SELECT SUM(play_count) FROM bebo_video) * 100 FROM bebo_video
Use ROUND, TRUNCATE, etc. to format the resulting values.

Related

SQL: getting rows where some 'x' column value is a maximal one [duplicate]

This question already has answers here:
How can I SELECT rows with MAX(Column value), PARTITION by another column in MYSQL?
(22 answers)
Closed 11 months ago.
I am trying to get data from my database.
Query upon sub-query upon another sub-query - and as the intermediate result I get looks like this:
item
quantity
pen
34
pencil
42
notebook
42
eraser
12
I need to build another query upon this result set to get the rows where item_quantity has it's maximal value (42 in the example above). The rows with pencils and notebooks. However, I have found out that the task is a bit trickier than I expected it to be.
SELECT * FROM sub_query_result HAVING quantity = MAX(quantity)
always returns an empty result set
SELECT * FROM sub_query_result HAVING quantity = 42
is pointless since I need to know the exact max quantity in advance
SELECT * FROM sub_query_result WHERE quantity = MAX(quantity)
simply works not ("Invalid use of group function")
I can see solutions that work but that I do not like -- due to extra actions I need to take on my back-end code that executes this sql request, or due to their inefficiency:
I can create a temporary table, get max. quantity from it and place to a variable. Then I can use this variable inside the query to that temporary table and get the data I need.
I can do
SELECT * FROM query_result HAVING quantity = (SELECT MAX(quantity) FROM
<Query upon sub-query upon another sub-query that shall return query_result>)
but that way I request the very same data twice! which in general is not a good approach.
So... Anything I missed? Any simple and elegant solutions that can solve my problem?
Order by quantity descending to get the max values first. Only select the first row, i.e. the one having the max values. ANSI SQL version:
SELECT * FROM query_result
ORDER BY quantity DESC
FETCH FIRST 1 ROW WITH TIES
WITH TIES means you can get several rows, if there are several rows having the same maximum quantity.

SQL Aggregate Function over partitions

I'm relatively new to SQL but have learned some cool stuff. I'm getting results that don't make sense. I've got a query with several subqueries and what-not but I have a windowed function that isn't working like I'm expecting.
The part that isn't working is this (simplified from the 300 line query):
SELECT AVG(table.sales_amount)
OVER (PARTITION BY table.month, table.sales_rep, table.department)
FROM table
The problem is that when I pull the data non aggregated I get a value different (107) than the above returns (95).
I've used windowed functions for COUNT and SUM and they work fine, but AVG is acting strangely. Am I missing something about how this works with AVG?
The subquery that table is a standin for looks like:
sales_rep, month, department, sales_amount
1, 2017-1, abc, 125.20
1, 2017-2, abc, 120.00
2, 2017-1, def, 100.00
...etc
Working out of Sql Server Management studio
SOLVED: I did finally figure it out, the results i was joining this subquery to had the sales rep multiple times in a month selling objects A&B which caused whoever sold both to be counted twice. whoops, my bad.
The results that you get should be the same values as in:
SELECT AVG(table.sales_amount)
FROM table
GROUP BY table.month, table.sales_rep, table.department;
Of course, the rows will be different. You need to match up the three key columns.
Based on your sample data, it looks like the partitioning keys uniquely define each row. Perhaps you really intend:
SELECT AVG(table.sales_amount) OVER () as overall_average
FROM table;
EDIT:
For the departmental average:
SELECT AVG(table.sales_amount) OVER (partition by table.department) as department_average
FROM table;
After some bruteforcing of potential errors I finally figured out the issue. I was joining that subquery to the another which had multiple instances of a sales_rep in a given month (selling objects a & b) which caused the average of those with sales of both objects to be counted twice instead of once.
so sales rep 1 sold objects a & b which made his avg count as 66% of the dept avg instead of 50%, and sales rep 2 count only 33%.

SQL COUNT - greater than some number without having to get the exact count?

There's a thread at https://github.com/amatsuda/kaminari/issues/545 talking about a problem with a Ruby pagination gem when it encounters large tables.
When the number of records is large, the pagination will display something like:
[1][2][3][4][5][6][7][8][9][10][...][end]
This can incur performance penalties when the number of records is huge, because getting an exact count of, say, 50M+ records will take time. However, all that's needed to know in this case is that the count is greater than the number of pages to show * number of records per page.
Is there a faster SQL operation than getting the exact COUNT, which would merely assert that the COUNT is greater than some value x?
You could try with
SQL Server:
SELECT COUNT(*) FROM (SELECT TOP 1000 * FROM MyTable) X
MySQL:
SELECT COUNT(*) FROM (SELECT * FROM MyTable LIMIT 1000) X
With a little luck, the SQL Server/MySQL will optimize this query. Clearly instead of 1000 you should put the maximum number of pages you want * the number of rows per page.

How to accurately figure percentage of a finite total

I’m in the process of creating a report that will tell end users what percentage of a gridview (the total number of records is a finite number) has been completed in a given month. I have a gridview with records that I’ve imported and users have to go into each record and update a couple of fields. I’m attempting to create a report tell me what percentage of the grand total of records was completed in a given month. All I need is the percentage. The grand total is (for this example) is 2000.
I’m not sure if the actual gridview information/code is needed here but if it does, let me know and I’ll add it.
The problem is that I have been able to calculate the percentage total but when its displayed the percentage total is repeated for every single line in the table. I’m scratching my head on how to make this result appear only once.
Right now here’s what I have for my SQL code (I use nvarchar because we import from many non windows systems and get all sorts of extra characters and added spaces to our information):
Declare #DateCount nvarchar(max);
Declare #DivNumber decimal(5,1);
SET #DivNumber = (.01 * 2541);
SET #DateCount = (SELECT (Count(date_record_entered) FROM dbo.tablename WHERE date_record_entered IS NOT NULL and date_record_entered >= 20131201 AND date_record_entered <= 20131231);
SELECT CAST(ROUND(#DivNumber / #DateCount, 1) AS decimal(5,1) FROM dbo.tablename
Let’s say for this example the total number of records in the date_record_entered for the month of December is 500.
I’ve tried the smaller pieces of code separately with no success. This is the most recent thing I’ve tried.
I know I'm missing something simple here but I'm not sure what.
::edit::
What I'm looking for as the expected result of my query is to have a percentage represented of records modified in a given month. If 500 records were done that would be 25%. I just want to have the 25 (and trainling decimal(s) when it applies) showing once and not 25 showing for every row in this table.
The following query should provide what you are looking for:
Declare #DivNumber decimal(5,1);
SET #DivNumber = (.01 * 2541);
SELECT
CAST(ROUND(#DivNumber / Count(date_record_entered), 1) AS decimal(5,1))
FROM dbo.tablename
WHERE date_record_entered IS NOT NULL
and date_record_entered >= 20131201
AND date_record_entered <= 20131231
Why do you select the constant value cast(round(#divNumber / #DateCount, 1) as decimal(5,1)from the table? That's the cause of your problem.
I'm not too familiar with sql server, but you might try to just select without a from clause.
select emp.Natinality_Id,mstn.Title as Nationality,count(emp.Employee_Id) as Employee_Count,
count(emp.Employee_Id)* 100.0 /nullif(sum(count(*)) over(),0) as Nationality_Percentage
FROM Employee as emp
left join Mst_Natinality as mstn on mstn.Natinality_Id = emp.Natinality_Id
where
emp.Is_Deleted='false'
group by emp.Natinality_Id,mstn.Title

Select Average of Top 25% of Values in SQL

I'm currently writing a stored procedure for my client to populate some tables that will be used to generate SSRS reports later on. Some of the data is based on specific stock formulas that are run on each of their clients' quarterly data (sent to them by their clients). The other part of the data is generated by comparing those results against those from other, similar sized clients. One of the things that they want tracked in their reports is the average of the top 25% of formula results for that particular comparison group.
To give a better picture of it, imagine the following fields that I have in a temp table:
FormulaID int
Value decimal (18,6)
I want to do the following: Given a specific FormulaID return the average of the top 25% of Value.
I know how to take an average in SQL, but I don't know how to do it against only the top 25% of a specific group.
How would I write this query?
I guess you can do something like this...
SELECT AVG(Q.ColA) Avg25Prec
FROM (
SELECT TOP 25 Percent ColA
FROM Table_Name
ORDER BY SomeCOlumn
) Q
Here's what I did, given the table shown above:
select AVG(t.Value)
from (select top 25 percent Value
from #TempGroupTable
where FormulaID = #PassedInFormulaID
order by Value desc) as t
The desc must be there, because the percent command will not actually do comparisons. It will just simply grab the first x number of records, with x being equal to 25% of the count of records it's querying. Therefore, the order by Value desc line then will grab the top 25% records which have the highest Value, and then sends that info to be averaged.
As a side note to all of this, this also means that if you wanted to grab the bottom 25% instead, or if your formula results are like a golf score (i.e. lowest is the best), all you would need to do is remove the desc part and you would be good to go.