Grouping counts into overlapping categories - sql

I'm looking to group counts into categories of (0+, 5+, 10+, 15+, etc.)
So an agent with 7 leads should be counted in the 0+, 5+ groups, but not 10+, 15+.
Postgres Query:
WITH agent_stats AS (
SELECT agent_id, FLOOR(COUNT(*)/5) AS count_category
FROM leads
GROUP BY 1
)
SELECT count_category, COUNT(*)
FROM agent_stats
GROUP BY 1
ORDER BY 1
Result:
| count_category | count |
| 0 | 12 |
| 5 | 18 |
| 15 | 9 |
| 20 | 4 |
Desired:
| count_category | count |
| 0 | 43 |
| 5 | 31 |
| 15 | 13 |
| 20 | 4 |

The simplest way is probably a cumulative sum:
WITH agent_stats AS (
SELECT agent_id, FLOOR(COUNT(*)/5) AS count_category
FROM leads
GROUP BY 1
)
SELECT count_category, COUNT(*),
SUM(COUNT(*)) OVER (ORDER BY count_category DESC)
FROM agent_stats
GROUP BY 1
ORDER BY 1;

Related

Get the position of X user in the ranking

I have these tables
RANKING
+-----------+----------+
| id_users | points |
+-----------+----------+
| 1 | 27 | //3rd
| 2 | 55 | //1st
| 3 | 9 | //5th
| 4 | 14 | //4th
| 5 | 38 | //2nd
+-----------+----------+
I would like to retrieve user's data along with its ranking position, filtering by id. So for example if I want info for id 3 I should get
+----------+--------|---------------+
| id_users | points | rank_position |
+----------+--------|---------------+
| 3 | 9 | 5 |
+----------+--------|---------------+
My query actually has the following:
SELECT
ROW_NUMBER() OVER (ORDER BY points ASC) AS RowNum,
id_users
FROM
RANKING
And I don't know how to continue
If you use ROW_NUMBER(), you need to use a subquery:
SELECT r.*
FROM (SELECT r.*,
ROW_NUMBER() OVER (ORDER BY points ASC) AS RowNum
FROM RANKING r
) r
WHERE id_users = 5;

Combine PARTITION BY and GROUP BY

I have a (mssql) table like this:
+----+----------+---------+--------+--------+
| id | username | date | scoreA | scoreB |
+----+----------+---------+--------+--------+
| 1 | jim | 01/2020 | 100 | 0 |
| 2 | max | 01/2020 | 0 | 200 |
| 3 | jim | 01/2020 | 0 | 150 |
| 4 | max | 02/2020 | 150 | 0 |
| 5 | jim | 02/2020 | 0 | 300 |
| 6 | lee | 02/2020 | 100 | 0 |
| 7 | max | 02/2020 | 0 | 200 |
+----+----------+---------+--------+--------+
What I need is to get the best "combined" score per date. (With "combined" score I mean the best scores per user and per date summarized)
The result should look like this:
+----------+---------+--------------------------------------------+
| username | date | combined_score (max(scoreA) + max(scoreB)) |
+----------+---------+--------------------------------------------+
| jim | 01/2020 | 250 |
| max | 02/2020 | 350 |
+----------+---------+--------------------------------------------+
I came this far:
I can group the scores by user like this:
SELECT
username, (max(scoreA) + max(scoreB)) AS combined_score,
FROM score_table
GROUP BY username
ORDER BY combined_score DESC
And I can get the best score per date with PARTITION BY like this:
SELECT *
FROM
(SELECT t.*, row_number() OVER (PARTITION BY date ORDER BY scoreA DESC) rn
FROM score_table t) as tmp
WHERE tmp.rn = 1
ORDER BY date
Is there a proper way to combine these statements and get the result I need? Thank you!
Btw. Don't care about possible ties!
You can combine window functions and aggregation functions like this:
SELECT s.*
FROM (SELECT username, date, (max(scoreA) + max(scoreB)) AS combined_score,
ROW_NUMBER() OVER (PARTITION BY date ORDER BY max(scoreA) + max(scoreB) DESC) as seqnum
FROM score_table
GROUP BY username, date
) s
ORDER BY combined_score DESC;
Note that date needs to be part of the aggregation.

select rows based on equal columns values

consider we have a table with this columns
Id
fk_newsId
fk_NewsGroupId
fk_NewsZoneId
I need to select all records with same fk_NewsGroup and fk_NewsZone
something like this
+----+-----------+--------------+-------------+
| Id | fk_NewsId | fk_NewsGroup | fk_NewsZone |
+----+-----------+--------------+-------------+
| 1 | 60 | 5 | 8 |
| 2 | 30 | 5 | 8 |
| 3 | 31 | 9 | 20 |
| 4 | 5 | 9 | 20 |
| 5 | 12 | 9 | 20 |
| 6 | 1000 | 20 | 11 |
| 7 | 21 | 20 | 11 |
| 8 | 6 | 20 | 11 |
+----+-----------+--------------+-------------+
how can do that?
I tride group by like this
but it dosnt give desired output
select fk_NewsId, fk_NewsGroup,fk_NewsZone from tbl_test
group by fk_NewsGroup,fk_NewsZone,fk_NewsId
You can try to use COUNT with window function, to get the count by fk_NewsGroup and fk_NewsZone columns.
then get count greater than one.
SELECT *
FROM (
SELECT *,COUNT(*) OVER(PARTITION BY fk_NewsGroup,fk_NewsZone ORDER BY fk_NewsZone) cnt
FROM tbl_test
)t1
where t1.cnt > 1
dbfiddle
Not absolutely clear as to what you mean, but something like so:
SELECT t.Id, t.fk_NewsId, t.fk_NewsGroup, t.fk_NewsZone FROM tbl_test t
INNER JOIN (
SELECT fk_NewsGroup,fk_NewsZone, COUNT(*) AS Counted FROM tbl_test
GROUP BY fk_NewsGroup,fk_NewsZone
HAVING COUNT(*) > 1) g
ON t.fk_NewsGroup = g.fk_NewsGroup
AND t.fk_NewsZone = g.fk_NewsZone
DBFiddle example
I would use Group by and do it like:
select max(id) as Id, Max(fk_NewsId) as fk_NewsId, fk_NewsGroup,fk_NewsZone from #temp
group by fk_NewsGroup,fk_NewsZone

Query to get the count of data for particular customer with all other data from table

My table structure is as follows:
group_id | cust_id | ticket_num
------------------------------
60 | 12 | 1
60 | 12 | 2
60 | 12 | 3
60 | 12 | 4
60 | 30 | 5
60 | 30 | 6
60 | 31 | 7
60 | 31 | 8
65 | 02 | 1
I want to fetch all the data for group_id=60 and find the count of ticket_num for each customer in that group. My output should be like this:
cust_id | ticket_count | ticket_num
------------------------------
12 | 4 | 1
12 | | 2
12 | | 3
12 | | 4
30 | 2 | 5
30 | | 6
31 | 2 | 7
31 | | 8
I tried this query:
SELECT gd.cust_id, Count(gd.cust_id),gd.ticket_num
FROM Group_details gd
WHERE gd.group_id = 65
GROUP BY gd.cust_id;
But this query is not working.
You appear to want the ANSI/ISO standard row_number() functions and count() as a window function:
select gd.cust_id, count(*) over (partition by gd.cust_id) as num_tickets,
row_number() over (order by gd.cust_id) as ticket_seqnum
from group_details gd
where gd.group_id = 60;
use aggregate and subquery
select t2.*,t1.ticket_num from Group_details t1
inner join
(
SELECT gd.cust_id, Count(gd.ticket_num) as ticket_count
FROM Group_details gd where gd.group_id = 60
GROUP BY gd.cust_id
) t2 on t1.cust_id=t2.cust_id
http://sqlfiddle.com/#!9/dd718b/1

Sum length of overlapping intervals

I've got a table in a Redshift database that contains intervals which are grouped and that potentially overlap, like so:
| interval_id | l | u | group |
| ----------- | -- | -- | ----- |
| 1 | 1 | 10 | A |
| 2 | 2 | 5 | A |
| 3 | 5 | 15 | A |
| 4 | 26 | 30 | B |
| 5 | 28 | 35 | B |
| 6 | 30 | 31 | B |
| 7 | 44 | 45 | B |
| 8 | 56 | 58 | C |
What I would like to do is to determine the length of the union of the intervals within group. That is, for each interval take u - l, sum over all group members and then subtract off the length of the overlaps between the intervals.
Desired result:
| group | length |
| ----- | ------ |
| A | 14 |
| B | 10 |
| C | 2 |
This question has been asked before, alas it seems that all of the solutions in that thread use features that Redshift doesn't support.
This is not difficult but requires multiple steps. The key is to define the "islands" within each group and then aggregate over those. Lots of subquerys, aggregations, and window functions.
select groupId, sum(ul)
from (select groupId, (max(u) - min(l) + 1) as ul
from (select t.*,
sum(case when prev_max_u < l then 1 else 0 end) over (order by l) as grp
from (select t.*,
max(u) over (order by l rows between unbounded preceding and 1 preceding) as prev_max_u
from t
) t
) t
group by groupid, grp
) g
group by groupId;
The idea is to determine if there is an overlap at the beginning of each record. For this purpose, it uses a cumulative max function on all preceding records. Then, it determines if there is an overlap by comparing the previous max with the current l -- a cumulative sum of overlaps defines a group.
The rest is just aggregation. And more aggregation.