Group By with MAX value from another column - sql

Table FieldStudies is :
ID Name
---|-----------------------|
1 | Industrial Engineering|
2 | Civil Engineering |
3 | Architecture |
4 | Chemistry |
And table Eductionals is :
ID UserID Degree FieldStudy_ID
---|------|--------|------------|
1 | 100 | 3 | 4 |
2 | 101 | 2 | 2 |
3 | 101 | 3 | 2 |
4 | 101 | 4 | 3 |
5 | 103 | 3 | 4 |
6 | 103 | 4 | 2 |
I want to find the number of students in each FieldStudies , provided that the highest Degree is considered.
Output desired:
ID Name Count
---|-----------------------|--------|
1 | Industrial Engineering| 0 |
2 | Civil Engineering | 0 |
3 | Architecture | 1 |
4 | Chemistry | 2 |
I have tried:
select Temptable2.* , count(*) As CountField from
(select fs.*
from FieldStudies fs
left outer join
(select e.UserID , Max(e.Degree) As ID_Degree , e.FieldStudy_ID
from Eductionals e
group by e.UserID) Temptable
ON fs.ID = Temptable.FieldStudy_ID) Temptable2
group by Temptable2.ID
But I get the following error :
Column 'Eductionals.FieldStudy_ID' is invalid in the select list
because it is not contained in either an aggregate function or the
GROUP BY clause.

If I understand correctly, you want only the highest degree for each person. If so, you can use row_number() to whittle down the multiple rows for a given person and the rest is aggregation and join:
select fs.id, fs.Name, count(e.id)
from fieldstudies fs left join
(select e.*,
row_number() over (partition by userid order by degree desc) as seqnum
from educationals e
) e
on e.FieldStudy_ID = fs.id and seqnum = 1
group by fs.id, fs.Name
order by fs.id;

Related

Postgres - Unique values for id column using CTE, Joins alongside GROUP BY

I have a table referrals:
id | user_id_owner | firstname | is_active | user_type | referred_at
----+---------------+-----------+-----------+-----------+-------------
3 | 2 | c | t | agent | 3
5 | 3 | e | f | customer | 5
4 | 1 | d | t | agent | 4
2 | 1 | b | f | agent | 2
1 | 1 | a | t | agent | 1
And another table activations
id | user_id_owner | referral_id | amount_earned | activated_at | app_id
----+---------------+-------------+---------------+--------------+--------
2 | 2 | 3 | 3.0 | 3 | a
4 | 1 | 1 | 6.0 | 5 | b
5 | 4 | 4 | 3.0 | 6 | c
1 | 1 | 2 | 2.0 | 2 | b
3 | 1 | 2 | 5.0 | 4 | b
6 | 1 | 2 | 7.0 | 8 | a
I am trying to generate another table from the two tables that has only unique values for referrals.id and returns as one of the columns the count for each apps as best_selling_app_count.
Here is the query I ran:
with agents
as
(select
referrals.id,
referral_id,
amount_earned,
referred_at,
activated_at,
activations.app_id
from referrals
left outer join activations
on (referrals.id = activations.referral_id)
where referrals.user_id_owner = 1),
distinct_referrals_by_id
as
(select
id,
count(referral_id) as activations_count,
sum(coalesce(amount_earned, 0)) as amount_earned,
referred_at,
max(activated_at) as last_activated_at
from
agents
group by id, referred_at),
distinct_referrals_by_app_id
as
(select id, app_id as best_selling_app,
count(app_id) as best_selling_app_count
from agents
group by id, app_id )
select *, dense_rank() over (order by best_selling_app_count desc) best_selling_app_rank
from distinct_referrals_by_id
inner join distinct_referrals_by_app_id
on (distinct_referrals_by_id.id = distinct_referrals_by_app_id.id);
Here is the result I got:
id | activations_count | amount_earned | referred_at | last_activated_at | id | best_selling_app | best_selling_app_count | best_selling_app_rank
----+-------------------+---------------+-------------+-------------------+----+------------------+------------------------+-----------------------
2 | 3 | 14.0 | 2 | 8 | 2 | b | 2 | 1
1 | 1 | 6.0 | 1 | 5 | 1 | b | 1 | 2
2 | 3 | 14.0 | 2 | 8 | 2 | a | 1 | 2
4 | 1 | 3.0 | 4 | 6 | 4 | c | 1 | 2
The problem with this result is that the table has a duplicate id of 2. I only need unique values for the id column.
I tried a workaround by harnessing distinct that gave desired result but I fear the query results may not be reliable and consistent.
Here is the workaround query:
with agents
as
(select
referrals.id,
referral_id,
amount_earned,
referred_at,
activated_at,
activations.app_id
from referrals
left outer join activations
on (referrals.id = activations.referral_id)
where referrals.user_id_owner = 1),
distinct_referrals_by_id
as
(select
id,
count(referral_id) as activations_count,
sum(coalesce(amount_earned, 0)) as amount_earned,
referred_at,
max(activated_at) as last_activated_at
from
agents
group by id, referred_at),
distinct_referrals_by_app_id
as
(select
distinct on(id), app_id as best_selling_app,
count(app_id) as best_selling_app_count
from agents
group by id, app_id
order by id, best_selling_app_count desc)
select *, dense_rank() over (order by best_selling_app_count desc) best_selling_app_rank
from distinct_referrals_by_id
inner join distinct_referrals_by_app_id
on (distinct_referrals_by_id.id = distinct_referrals_by_app_id.id);
I need a recommendation on how best to achieve this.
I am trying to generate another table from the two tables that has only unique values for referrals.id and returns as one of the columns the count for each apps as best_selling_app_count.
Your question is really complicated with a very complicated SQL query. However, the above is what looks like the actual question. If so, you can use:
select r.*,
a.app_id as most_common_app_id,
a.cnt as most_common_app_id_count
from referrals r left join
(select distinct on (a.referral_id) a.referral_id, a.app_id, count(*) as cnt
from activations a
group by a.referral_id, a.app_id
order by a.referral_id, count(*) desc
) a
on a.referral_id = r.id;
You have not explained the other columns that are in your result set.

Each rows to column values

I'm trying to create a view that shows first table's columns plus second table's first 3 records sorted by date in 1 row.
I tried to select specific rows using offset from sub table and join to main table, but when joining query result is ordered by date, without
WHERE tblMain_id = ..
clause in joining SQL it returns wrong record.
Here is sqlfiddle example: sqlfiddle demo
tblMain
| id | fname | lname | salary |
+----+-------+-------+--------+
| 1 | John | Doe | 1000 |
| 2 | Bob | Ross | 5000 |
| 3 | Carl | Sagan | 2000 |
| 4 | Daryl | Dixon | 3000 |
tblSub
| id | email | emaildate | tblmain_id |
+----+-----------------+------------+------------+
| 1 | John#Doe1.com | 2019-01-01 | 1 |
| 2 | John#Doe2.com | 2019-01-02 | 1 |
| 3 | John#Doe3.com | 2019-01-03 | 1 |
| 4 | Bob#Ross1.com | 2019-02-01 | 2 |
| 5 | Bob#Ross2.com | 2018-12-01 | 2 |
| 6 | Carl#Sagan.com | 2019-10-01 | 3 |
| 7 | Daryl#Dixon.com | 2019-11-01 | 4 |
View I am trying to achieve:
| id | fname | lname | salary | email_1 | emaildate_1 | email_2 | emaildate_2 | email_3 | emaildate_3 |
+----+-------+-------+--------+---------------+-------------+---------------+-------------+---------------+-------------+
| 1 | John | Doe | 1000 | John#Doe1.com | 2019-01-01 | John#Doe2.com | 2019-01-02 | John#Doe3.com | 2019-01-03 |
View I have created
| id | fname | lname | salary | email_1 | emaildate_1 | email_2 | emaildate_2 | email_3 | emaildate_3 |
+----+-------+-------+--------+---------+-------------+---------------+-------------+---------------+-------------+
| 1 | John | Doe | 1000 | (null) | (null) | John#Doe1.com | 2019-01-01 | John#Doe2.com | 2019-01-02 |
You can use conditional aggregation:
select m.id, m.fname, m.lname, m.salary,
max(s.email) filter (where seqnum = 1) as email_1,
max(s.emailDate) filter (where seqnum = 1) as emailDate_1,
max(s.email) filter (where seqnum = 2) as email_2,
max(s.emailDate) filter (where seqnum = 3) as emailDate_2,
max(s.email) filter (where seqnum = 3) as email_3,
max(s.emailDate) filter (where seqnum = 3) as emailDate_3
from tblMain m left join
(select s.*,
row_number() over (partition by tblMain_id order by emailDate desc) as seqnum
from tblsub s
) s
on s.tblMain_id = m.id
where m.id = 1
group by m.id, m.fname, m.lname, m.salary;
Here is a SQL Fiddle.
Here is a solution that should get you what you expect.
This works by first ranking records within each table and joining them together. Then, the outer query uses aggregation to generate the expected output.
This solution will work even if the first record in the main table does not have id 1. Also filtering takes occurs within the JOINs, so this should be quite efficient.
SELECT
m.id,
m.fname,
m.lname,
m.salary,
MAX(CASE WHEN s.rn = 1 THEN s.email END) email_1,
MAX(CASE WHEN s.rn = 1 THEN s.emaildate END) email_date1,
MAX(CASE WHEN s.rn = 2 THEN s.email END) email_2,
MAX(CASE WHEN s.rn = 2 THEN s.emaildate END) email_date2,
MAX(CASE WHEN s.rn = 3 THEN s.email END) email_3,
MAX(CASE WHEN s.rn = 3 THEN s.emaildate END) email_date3
FROM
(
SELECT m.*, ROW_NUMBER() OVER(ORDER BY id) rn
FROM tblMain
) m
INNER JOIN (
SELECT
email,
emaildate,
ROW_NUMBER() OVER(PARTITION BY id ORDER BY emaildate) rn
FROM tblSub
) s
ON m.id = s.tblmain_id
AND m.rn = 1
AND s.rn <= 3
GROUP BY
m.id,
m.fname,
m.lname,
m.salary

Efficient ROW_NUMBER increment when column matches value

I'm trying to find an efficient way to derive the column Expected below from only Id and State. What I want is for the number Expected to increase each time State is 0 (ordered by Id).
+----+-------+----------+
| Id | State | Expected |
+----+-------+----------+
| 1 | 0 | 1 |
| 2 | 1 | 1 |
| 3 | 0 | 2 |
| 4 | 1 | 2 |
| 5 | 4 | 2 |
| 6 | 2 | 2 |
| 7 | 3 | 2 |
| 8 | 0 | 3 |
| 9 | 5 | 3 |
| 10 | 3 | 3 |
| 11 | 1 | 3 |
+----+-------+----------+
I have managed to accomplish this with the following SQL, but the execution time is very poor when the data set is large:
WITH Groups AS
(
SELECT Id, ROW_NUMBER() OVER (ORDER BY Id) AS GroupId FROM tblState WHERE State=0
)
SELECT S.Id, S.[State], S.Expected, G.GroupId FROM tblState S
OUTER APPLY (SELECT TOP 1 GroupId FROM Groups WHERE Groups.Id <= S.Id ORDER BY Id DESC) G
Is there a simpler and more efficient way to produce this result? (In SQL Server 2012 or later)
Just use a cumulative sum:
select s.*,
sum(case when state = 0 then 1 else 0 end) over (order by id) as expected
from tblState s;
Other method uses subquery :
select *,
(select count(*)
from table t1
where t1.id < t.id and state = 0
) as expected
from table t;

SQL Sum Columns

I want to get the following output:
Main table:
Email | Group | id
a#gmail.com | Y | 1
a#gmail.com | Y | 2
b#gmail.com | N | 3
c#gmail.com | N | 4
Join Table:
Email | Value
a#gmail.com | 10
b#gmail.com | 20
c#gmail.com | 30
Desired result (only take the a#gmail.com value once, despite appearing in the first table twice):
Group | Email Count | Sum
Y | 1 | 10
N | 2 | 50
Here is the sqlfiddle I've been playing around with:
http://sqlfiddle.com/#!9/c2a24d/8
You were close in your SQLFiddle. You just needed to join on a distinct select.
SELECT
e.Unsub as Unsub,
count(e.email) as EmailCount,
sum(c.sum) as EmailSum
FROM CountTable c
JOIN (select distinct email, Unsub from EmailsTable) e on c.email = e.email
GROUP BY e.unsub
SQLFiddle
First remove the duplicates, and then do the calculations
SQL DEMO
SELECT filter.`Unsub`, COUNT(*), SUM(`sum`)
FROM (
SELECT DISTINCT `Unsub`, `email`
FROM EmailsTable ) as filter
JOIN CountTable
ON filter.`email` = CountTable.`email`
GROUP BY filter.`Unsub`
OUTPUT
| Unsub | COUNT(*) | SUM(`sum`) |
|-------|----------|------------|
| N | 2 | 50 |
| Y | 1 | 10 |

Sql two table query most duplicated foreign key

I got those two tables sport and student:
First table sport:
|idsport | name |
_______________________
| 1 | bobsled |
| 2 | skating |
| 3 | boarding |
| 4 | iceskating |
| 5 | skiing |
Second table student:
foreign key
|idstudent | name | sport_idsport
__________________________________________
| 1 | john | 3 |
| 2 | pauly | 2 |
| 3 | max | 1 |
| 4 | jane | 2 |
| 5 | nico | 5 |
so far i did this it output which number is mostly inserted, but cant get it to work
with two tables
SELECT sport_idsport
FROM (SELECT sport_idsport FROM student GROUP BY sport_idsport ORDER BY COUNT(*) desc)
WHERE ROWNUM<=1;
I need to output name of most popular sport, in that case it would be skating.
I use oracle sql.
with counter as (
Select sport_idsport,
count(*) as cnt,
dense_rank() over (order by count(*) desc) as rn
from student
group by sport_idsport
)
select s.*, c.cnt
from sport s
join counter c on c.sport_idsport = s.idsport and c.rn = 1;
SQLFiddle example: http://sqlfiddle.com/#!4/b76e21/1
select cnt, sport_idsport from (
select count(*) cnt, sport_idsport
from student
group by sport_idsport
order by count(*) desc
)
where rownum = 1