SQL to get distinct statistics - sql

Suppose I have data in table X:
id assign team
----------------------
1 hunkim A
1 ygg A
2 hun B
2 gw B
2 david B
3 haha A
I want to know how many assigns for each id. I can get using:
select id, count(distinct assign) from
X group by id
order by count(distinct assign)desc;
It will give me something:
1 2
2 3
3 1
My question is how can I get the average of the all assign counts?
In addition, now I want to know the everage per team. So I want to get something like:
team assign_avg
-------------------
A 1.5
B 3
Thanks in advance!

SELECT
AVG(CAST(assign_count AS DECIMAL(10, 4)))
FROM
(SELECT
id,
COUNT(DISTINCT assign) AS assign_count
FROM
X
GROUP BY
id) Assign_Counts
.
SELECT
team,
AVG(CAST(assign_count AS DECIMAL(10, 4)))
FROM
(SELECT
id,
team,
COUNT(DISTINCT assign) AS assign_count
FROM
X
GROUP BY
id,
team) Assign_Counts
GROUP BY
Team

What you want can be done in one query, using aggregate functions COUNT and AVG:
SELECT t.id,
COUNT(*) AS num_instances,
AVG(t.id) AS assign_avg
FROM TABLE t
GROUP BY t.id
Columns that do not have an aggregate function performed on them need to be defined in the GROUP BY clause.

Related

Grouping over multiple columns and counting distinct over different groups

Given this data
month
id
1
x
1
x
1
y
2
z
2
x
2
y
My output should be
month
distinct_id
total_id
1
2
3
2
3
3
How can I achieve this in a single query?
I tried this query
SELECT TO_CHAR(DOCDATE,'MON') MON
,COUNT(DISTINCT T.MOB_MTCHED_LYLTY_ID) OVER() SHARE
from data
group by 1
but this is giving me an error
select month,
count(distinct id) distinct_id,
count(id) total_id
from data
group by month;
SELECT [Month], COUNT(DISTINCT id) as dist_id, COUNT(id) as count_id
FROM data
GROUP BY Month
Also i should say:
About your code - don't use OVER if it's not necessary
Don't use picutes in your question like you use it know - provide data in a small table is better

Sum distinct by separate ID column

I have some data of the form:
ID Value
A 2
B 2
C 3
A 2
A 2
C 3
B 2
I want to sum value by distinct IDs.
select sum(distinct value) from table would give the sum of 2 and 3 = 5. I don't want that, I want the sum of value for each ID, i.e. A=2, B=2, C=3, there's 3 distinct IDs so sum(2,2,3) = 7.
In 'sql-ish' I want something like select sum(distinct value by ID) from table. Is this possible?
Get the distinct combinations of ID and Value in a subquery and then the sum of Values:
SELECT SUM(Value) sum_value
FROM (SELECT DISTINCT ID, Value FROM tablename) t
Another way to do it is with SUM() window function:
SELECT DISTINCT SUM(MAX(Value)) OVER() sum_value
FROM tablename
GROUP BY ID
See the demo.

How to get grouping of rows in SQL

I have a table like this:
id name
1 washing
1 cooking
1 cleaning
2 washing
2 cooking
3 cleaning
and I would like to have a following grouping
id name count
1 washing,cooking,cleaning 3
2 washing,cooking 2
3 cleaning 1
I have tried to group by ID but can only show count after grouping by
SELECT id,
COUNT(name)
FROM WORK
GROUP BY id
But this will only give the count and not the actual combination of names.
I am new to SQL. I know it has to be relational but there must be some way.
Thanks in advance!
in postgresql you can use array_agg
SELECT id, array_agg(name), COUNT(*)
FROM WORK
GROUP BY id
in mysql you can use group_concat
SELECT id, group_concate(name), COUNT(*)
FROM WORK
GROUP BY id
or for redshift
SELECT id, listagg(name), COUNT(*)
FROM WORK
GROUP BY id

Grouping by number of occurrences of a repeatable value in Oracle SQL

Lets assume we have a table like this.
id name value
1 x 12
2 x 23
3 y 47
4 x 18
5 y 29
6 z 45
7 y 67
Doing a normal group by name would yield us
select name,count(*) from table group by name;
name count(*)
x 3
y 3
z 1
I want to get the reverse.. ie. grouping the number of names that occur a set number of times. I want my output to be
count number of elements occuring count times
1 1
3 2
Is it possible to do this using just a single query? Another way is to use a temp table but I dont want to do that.
Thanks
You need one more group by:
select cnt, count(*), min(name), max(name)
from (select name, count(*) as cnt
from table
group by name
) n
group by cnt
order by 1;
I do these types of histogram queries all the time. The min() and max() provide sample data. This is useful to understand outliers and unexpected values.
You can GROUP BY twice, e.g.
with
Names as (
select name as name,
count(1) as cnt
from MyTable
group by name)
select count(1),
cnt
from Names
group by cnt

TSQL - Sum of Top 3 records of multiple teams

I am trying to generate a TSQL query that will take the top 3 scores (out of about 50) for a group of teams, sum the total of just those 3 scores and give me a result set that has just the name of the team, and that total score ordered by the score descending. I'm pretty sure it is a nested query - but for the life of me can't get it to work!
Here are the specifics, there is only 1 table involved....
table = comp_lineup (this table holds a separate record for each athlete in a match)
* athlete
* team
* score
There are many athletes to a match - each one belongs to a team.
Example:
id athlete team score<br>
1 1 1 24<br>
2 2 1 23<br>
3 3 2 21<br>
4 4 2 25<br>
5 5 1 20<br>
Thank You!
It is indeed a subquery, which I often put in a CTE instead just for clarity. The trick is the use of the rank() function.
;with RankedScores as (
select
id,
athlete,
team,
score,
rank() over (partition by team order by score desc) ScoreRank
from
#scores
)
select
Team,
sum(Score) TotalScore
from
RankedScores
where
ScoreRank <= 3
group by
team
order by
TotalScore desc
To get the top n value for every group of data a query template is
Select group_value, sum(value) total_value
From mytable ext
Where id in (Select top *n* id
From mytable sub
Where ext.group_value = sub.group_value
Order By value desc)
Group By group_value
The subquery retrieve only the ID of the valid data for the current group_value, the connection between the two dataset is the Where ext.group_value = sub.group_value part, the WHERE in the main query is used to mask every other ID, like a cursor.
For the specific question the template became
Select team, sum(score) total_score
From mytable ext
Where id in (Select top 3 id
From mytable sub
Where ext.team = sub.team
Order By score desc)
Group By team
Order By sum(score) Desc
with the added Order By in the main query for the descending total score