POSTGRESQL: Weighted average instead of average?

POSTGRESQL: Weighted average instead of average? - sql

I have a list of averages
SELECT tv.id, AVG(ut.rating) FROM user_tvshow AS ut
LEFT JOIN tvshows AS tv ON tv.id = ut.tvshow
WHERE "user" IN (
SELECT follows FROM user_follows WHERE "user" = 1 -- List of users the current user follows
) AND rating IS NOT NULL GROUP BY tv.id;
At the moment it averages the results as expected. Is there any way to weight this average with the number of rows in the group? So that one row of rating 10 won't appear higher than 100 rows of rating 9.

This is not what a weighted average is. It sounds like you are trying to get at a Bayesian average where you penalize a small set by moving its observed average towards some meta-average. There is no built in way to do this in PostgreSQL.
Compute the sum and count separately, and then use some mechanism to implement the penalty based on those values. You could do that in the client, or you could write an outer query which takes the results of the subquery and applies the formula.
select id, (the_sum + 10* <metaaveerage>)/(the_count+10) from (
SELECT tv.id, sum(ut.rating) as the_sum, count(ut.rating) as the_count FROM user_tvshow AS ut
LEFT JOIN tvshows AS tv ON tv.id = ut.tvshow
WHERE "user" IN (
SELECT follows FROM user_follows WHERE "user" = 1 -- List of users the current user follows
) AND rating IS NOT NULL GROUP BY tv.id
) foobar
How you decide what values to plug in for the 10 and for the <metaaverage> are questions of statistics, not programming.

Related

SQL Pivot Total Column

I am building an NFL Pickem application. The query in question should show a list of all players in the league along with the team they picked to win each game in a given week. This sample is hard coded for week 1 to keep it simpler and focus on my main question.
I am getting stuck trying to add an additional column for Total Points. This Total Points column should perform a calculation based on the Pick.ConfidencePoints column for each player for the given week.
The query below is working the way I want except for the Total Points column.
Whenever I try to add that column things get messed up.
The query currently produces results that look like this:
Here is the current query:
SELECT Player, [1],[2],[3]
FROM
(SELECT
Player.Name AS Player,
Game.Week,
Team.CityShort,
Game.ID AS GameId
FROM Pick
LEFT JOIN Player ON Pick.PlayerId = Player.Id
LEFT JOIN Team ON Pick.PickedWinnerTeamId = Team.Id
LEFT JOIN Game ON Pick.GameId = Game.Id
WHERE Game.Week = 1
GROUP BY Player.Name, Game.Week, Team.CityShort, Game.Id) AS SourceData
PIVOT
(
MAX (CityShort)
FOR GameId IN ([1],[2],[3])
) AS PivotTable

SQL Calculations over tables

There are 2 tables, there is an expected result, the result is to have the total cost of each engagement calculated, there are multiple tests taken during each engagement, each test ranges in cost (all set values), the expected result must be in terms of EngagementId, EngagementCost
The 2 tables, with there respective fields
- EngagementTest (EngagementId, TestId)
- Test (TestId, TestCost)
How would one go calculating the cost of each engagement.
This is as far as i managed to get
SELECT EngagementId, COUNT(TESTId)
FROM EngagementTest
GROUP BY EngagementId;

Try a SUM of the TestCost column rather than a COUNT. COUNT just tells you the number of rows. SUM adds up the values within the rows and gives you a total. Also your existing query doesn't actually use the table that contains the cost data. You can INNER JOIN the two tables via TestId and then GROUP BY the EngagementId so you get the sum of each engagement.
Something like this:
SELECT
ET.EngagementId,
SUM(T.TestCost)
FROM
EngagementTest ET
INNER JOIN Test T
ON T.TestId = ET.TestId
GROUP BY
ET.EngagementId

It can be achieved using below query.
SELECT i.EngagementId, SUM(TestCost)
FROM EngagementTest i
INNER JOIN Test t
ON e.TestId = t.TestId
GROUP BY i.EngagementId

Nested subquery in Access alias causing "enter parameter value"

I'm using Access (I normally use SQL Server) for a little job, and I'm getting "enter parameter value" for Night.NightId in the statement below that has a subquery within a subquery. I expect it would work if I wasn't nesting it two levels deep, but I can't think of a way around it (query ideas welcome).
The scenario is pretty simple, there's a Night table with a one-to-many relationship to a Score table - each night normally has 10 scores. Each score has a bit field IsDouble which is normally true for two of the scores.
I want to list all of the nights, with a number next to each representing how many of the top 2 scores were marked IsDouble (would be 0, 1 or 2).
Here's the SQL, I've tried lots of combinations of adding aliases to the column and the tables, but I've taken them out for simplicity below:
select Night.*
,
( select sum(IIF(IsDouble,1,0)) from
(SELECT top 2 * from Score where NightId=Night.NightId order by Score desc, IsDouble asc, ID)
) as TopTwoMarkedAsDoubles
from Night

This is a bit of speculation. However, some databases have issues with correlation conditions in multiply nested subqueries. MS Access might have this problem.
If so, you can solve this by using aggregation with a where clause that chooses the top two values:
select s.nightid,
sum(IIF(IsDouble, 1, 0)) as TopTwoMarkedAsDoubles
from Score as s
where s.id in (select top 2 s2.id
from score as s2
where s2.nightid = s.nightid
order by s2.score desc, s2.IsDouble asc, s2.id
)
group by s.nightid;
If this works, it is a simply matter to join Night back in to get the additional columns.

Your subquery can only see one level above it. so Night.NightId is totally unknown to it hence why you are being prompted to enter a value. You can use a Group By to get the value you want for each NightId then correlate that back to the original Night table.
Select *
From Night
left join (
Select N.NightId
, sum(IIF(S.IsDouble,1,0)) as [Number of Doubles]
from Night N
inner join Score S
on S.NightId = S.NightId
group by N.NightId) NightsWithScores
on Night.NightId = NightsWithScores.NightId
Because of the IIF(S.IsDouble,1,0) I don't see the point is using top.

Using SUM() with multiple tables in sql

I am trying to write a query that will use the sum function to add up all values in 1 column then divide by the count of tuples in another table. For some reason when i run the sum query by itself i get the correct number back but when i use it in my query below the value is wrong.
this is what im trying to do but the numbers are coming out wrong.
select (sum(adonated) / count(p.pid)) as "Amount donated per Child"
from tsponsors s, player p;
I found out the issue is in the sum. below returns 650,000 when it should return 25000
select (sum(adonated)) as "Amount donated per Child"
from tsponsors s, player p;
if i remove the from player p it gets the correct amount. However i need the player table to get the number of players.
I have 3 tables that are related to this query.
player(pid, tid(fk))
team(tid)
tsponsors(tid(fk), adonated, sid(fk)) this is a joining table
what i want to get is the sum of all the amounts donated to each team sum(adonated) and divide this by the number of players in the database count(pid).

I guess your sponsors are giving amounts to teams. You then want to know the proportion of donations per child in the sponsored team.
You would then need something like this:
SELECT p.tid,(SUM(COALESCE(s.adonated,0)) / COUNT(p.pid)) AS "Amount donated per Child"
FROM player p
LEFT OUTER JOIN tsponsors s ON s.tid=p.tid
GROUP BY p.tid
I also used a LEFT OUTER JOIN in order to show 0$ if a team has no sponsors.

Try
select sum(s.adonated) / (SELECT count(p.pid) FROM player p)
as "Amount donated per Child"
from tsponsors s;
Your original query joins 2 tables without any condition, which results in cross join.
UPDATE
SELECT ts.tid, SUM(ts.adonated),num_plyr
FROM tsponsors ts
INNER JOIN
(
SELECT tid, COUNT(pid) as num_plyr
FROM player
GROUP BY tid
)a ON (a.tid = ts.tid)
GROUP BY ts.tid,num_plyr

filter out deviating record with sql

We have this set of data that we need to get the average of a column. a select avg(x) from y does the trick. However we need a more accurate figure.
I figured that there must be a way of filtering records that has either too high or too low values(spikes) so that we can exclude them in calculating the average.

There are three types of average, and what you are originally using is the mean - the sum of all the values divided by the number of values.
You might find it more useful to get the mode - the most frequently occuring value:
select name,
(select top 1 h.run_duration
from sysjobhistory h
where h.step_id = 0
and h.job_id = j.job_id
group by h.run_duration
order by count(*) desc) run_duration
from sysjobs j
If you did want to get rid of any values outside the original standard deviation, you could find the average and the standard deviation in a subquery, eliminate those values which are outside the range : average +- standard deviation, then do a further average of the remaining values, but you start running the risk of having meaningless values:
select oh.job_id, avg(oh.run_duration) from sysjobhistory oh
inner join (select job_id, avg(h.run_duration) avgduration,
stdev(h.run_duration) stdev_duration
from sysjobhistory h
group by job_id) as m on m.job_id = oh.job_id
where oh.step_id = 0
and abs(oh.run_duration - m.avgduration) < m.stdev_duration
group by oh.job_id

in sql server there's also the STDEV function so maybe that can be of some help...

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

POSTGRESQL: Weighted average instead of average? - sql

Related

SQL Pivot Total Column

SQL Calculations over tables

Nested subquery in Access alias causing "enter parameter value"

Using SUM() with multiple tables in sql

filter out deviating record with sql

Categories

Resources