How to calculate average using case when and distinct count? - sql

My source table contains sales information. Each row is a person and records every time they've shopped/where. I can therefore calculate the average transaction value per industry by the following:
select
industry,
COALESCE(AVG(CASE WHEN shopcode in (1,2,4) THEN dollar END), 0) AS avt
from sales
group by industry
But how can I adapt this to calculate the spend per distinct count of user i.e.: sum(dollar)/count(distinct person) so similar to above but instead of sum/count(*) sum/count(distinct person)... I need to use coalesce with this as well.

how can i adapt this to calculate the spend per distinct count of user i.e.: sum(dollar)/count(distinct person)
You can use:
select industry,
sum(dollar) / count(distinct person)
from sales
group by industry;
I'm not sure what the filtering on shop_code is for. It is in your query but not part of the question. If you want this for particular shops, I would suggest moving this to a where clause:
select industry,
sum(dollar) / count(distinct person)
from sales
where shop_code in (1, 2, 4)
group by industry;

Related

Is there a better way to formulate this SQL query?

I am very new to the SQL universe, and I came across this prompt that I was able to fulfill, but I have to imagine I'm missing a more direct and intuitive solution. My solution returns the correct response in SQLite within rounding error to over 10 decimal places but technically does not match the reported solution. I appreciate any insight.
Prompt:
Find the difference between the average rating ["stars"] of movies released before 1980 and the average rating of movies released after 1980. (The difference between the average of averages before and after.)
The database includes 3 tables with the following columns (simplified for relevance):
movie| mID*, year
reviewer| rID*, name
rating| rID*, mI*, stars
"mavg" is my own aliased aggregation
select distinct(
(select avg(mavg)
from(
(select *, avg(stars) as mavg
from rating
group by mID) join movie using(mID) )
where year < 1980) -
(select avg(mavg)
from(
(select *, avg(stars) as mavg
from rating
group by mID) join movie using(mID) )
where year >= 1980)
)
from rating
;
Let's look at your subquery:
select *, avg(stars) as mavg
from rating
group by mID
This is an invalid query. With GROUP BY mid you say you want to aggregate your rows to get one result row per mID. But then you don't only select the average rating, but all columns from the table (SELECT *). One of these columns is stars. How can you select the stars column into one row, when there are many rows for an mID? Most DBMS report a syntax error here. SQLite picks one of the stars from any of the mID's rows arbitrarily instead. So, while this is considered valid in SQLite, it isn't in standard SQL, and you should not write such queries.
To the result (the average per movie) you join the movies table. And then you select the average of the movie ratings for the movies in the desired years. This is well done, but you could have put that restriction (join or IN clause or EXISTS clause) right into the subquery in order to only calculate the averages for the movies you want, rather then calculating all averages and then only keep some of the movies and dismiss others. But that's a minor detail.
Then you subtract the new average from the old one. This means you subtract one value from another and end up with exactly the one value you want to show. But instead of merely selecting this value (SELECT (...) - (...)) you are linking the value with the rating table (SELECT (...) - (...) FROM rating) for no apparent reason, thus selecting the desired value as often as there are rows in the rating table. You then notice this and apply DISTINCT to get rid of the rows you just created unnecessarily yourself. DISTINCT is very, very often an indicator for a badly written query. When you think you need DISTINCT, ask yourself what makes this necessary. Where do the duplicate rows come from? Have you created them yourself? The amend this.
The query can be written thus:
select
avg(case when m.year < 1980 then r.movie_rating end) -
avg(case when m.year >= 1980 then r.movie_rating end) as diff
from
(
select mid, avg(stars) as movie_rating
from rating
group by mid
) r
join movie m using (mid);
Using a case expression inside an aggregation function is called conditional aggregation and is often the preferred solution when working with diferent aggregates.
You may use the following single query here:
SELECT AVG(CASE WHEN m.year < 1980 THEN r.stars END) -
AVG(CASE WHEN m.year >= 1980 THEN r.stars END) AS mavg
FROM rating r
INNER JOIN movie m ON m.mID = r.mID;

PostgreSQL - max(count()) agregation with group by

I have a table that contains unique transactions along with the year of the transaction and the employee who executed it. I need to find the employee with most transactions in each year.
I need a table with each year, the employee w/ most transactions in that year, and the number of transactions they had in that year.
This is as close as I am able to get without producing an error. I am unable to select the employee without producing an aggregation error.
select year, max(num_trans)
from (select year, employee, count(trans_id) as num_trans
from transactions
group by year, employee) as x
group by year
I am curious about how to work around this.
Use distinct on:
select distinct on (year) year, employee, count(*) as num_trans
from transactions
group by year, employee
order by year, count(*) desc;
distinct on is a handy Postgres extension to standard SQL that keeps the first row in a group of rows. The groups are defined by the distinct on key(s). Which row is first is determined by the order by.

Group by and Pivot functions giving different counts

Group by and Pivot operations give different counts.
I used Group by to get count of vehicles by City and used Pivot to get count of vehicles by Make.
SELECT MAKE, [AMB],[BNG],[CBE],[GBM],[KKE],[OMR],[PDR]
FROM
(
SELECT MAKE, BRANCH, COUNT(DISTINCT [VEH NO]) [VEHICLE COUNT]
FROM MAKE_MODEL_DESCRIPTION
GROUP BY MAKE, BRANCH
) X
PIVOT
(
SUM([VEHICLE COUNT]) FOR BRANCH IN ([AMB],[BNG],[CBE],[GBM],[KKE],
[OMR],[PDR])
) AS PVT
The total count I get for above Pivot query is 150.
select BRANCH, COUNT(distinct [VEH NO])
from MAKE_MODEL_DESCRIPTION
group by BRANCH
The total count I get for above GROUP BY query is 140.
Shouldn't both the same number given they are from same data source?
Can someone let me know where I am going wrong.
No, you should not expect the counts to be the same. The GROUP BY is counting distinct vehicles over all makes. The PIVOT is counting distinct vehicles only within a single branch and model.
In other words, the same vehicle might be in different branches.
If you include the make, then the numbers should be the same:
select MAKE, BRANCH, COUNT(distinct [VEH NO])
from MAKE_MODEL_DESCRIPTION
group by MAKE, BRANCH

Count distinct while grouping in MS Access 2010

I have a database of sales and I want to be able to see what has sold during a particular time frame over the years i.e. see what sold most in the last 10 years between July 1 and July 15. Problem is that not all the items were sold every year, and I need to be able to get the average sold. I was able to count distinct years in MySQL but after migrating to Access I can't figure out how to count distinct while still grouping by the individual product.
Relevant fields are StockID (the items' unique id) and TransDate (a datetime that I pull the year from) I've tried things similar to SELECT Count(*) FROM (SELECT DISTINCT YEAR(Transdate) from Sales) inside my other query but that always gives me the count from all items (basically giving the number of years in the database) rather than a count for each item.
TL;DR I can either count distinct on the whole DB which is useless or group by StockID without counting distinct which is mildly less useless.
Seems you are looking for select with group by ..
select StockID, count(*)
from (select distinct stockID, Year( Transdate) from my_table )
group StockID

How to calculate the average from a count result in sql

i have the following tables:
VISITS:
vid,
pid,
date
PATIENT:
pid,
pname,
age,
gender
so i want to know the average of visits for each patient.
I have tried to solve it so hard, but still can't get it done!
hope someone can help me out.
You can do this by counting the total number of visits and dividing by the total number of patients.
select (select count(*) from visits) / (select 1.0*count(*) from patients) as AvgVisitsPerPatient
Note the following:
The 1.0* is needed to change the count to a decimal. SQL Server does integer division on integers.
The use of the patients table. Some patients may not have any visits, and they are included.
The use of nested subqueries in the select. This is allowed.
EDIT:
You do not have enough information to calculate averages for a given unit of time. Although you have the date for a visit, you don't have a date for the patient, so you don't know who the population is at any given point of time.
The Average number of visits for each patient is not possible. There is only one scalar value for the total number of visits for any single patient. You cannot average that. You could average the number of visits for all patients, or the number of visits per month for each patient, but you cannot average a set of values when there is only one value in the set. (or to be technically accurate, you can average a single value, but the average is the same as the value because the count is one.)
Average number of visits for all patients =
Sum of visits / number of patients
Select Avg(Count(*))
From Visits
Group By pid
Average number of visits per month for each patient
Select p.pid, p.pname,
Avg(z.monthVisits) AvgMonthlyVisits
From (Select Pid, Count(*) monthVisits
From table
Group By Pid, Month(date)) z
join Patient p
on p.pid = z.pid
Group By p.pid