SQL MIN across multiple columns with a GROUP BY - sql

I'm trying to take a table with information as follows:
+----+---+---+
| ID | X | Y |
+----+---+---+
| A | 1 | 3 |
| A | 1 | 1 |
| A | 1 | 2 |
| A | 1 | 7 |
| B | 2 | 2 |
| B | 3 | 3 |
| B | 1 | 9 |
| B | 2 | 4 |
| B | 2 | 1 |
| C | 1 | 1 |
+----+---+---+
I'd like to be able to select the minimum across both columns, grouping by the first column - the "X" column is more important than the Y column. So for example, the query should return something like this:
+----+---+---+
| ID | X | Y |
+----+---+---+
| A | 1 | 1 |
| B | 1 | 9 |
| C | 1 | 1 |
+----+---+---+
Any ideas? I've gone through dozens of posts and experiments and no luck so far.
Thanks,
James

You seem to want the row that has the minimum x value. And, if there are duplicates on x, then take the one with the minimum y.
For this, use row_number():
select id, x, y
from (select t.*,
row_number() over (partition by id order by x, y) as seqnum
from t
) t
where seqnum = 1
If your database does not support window functions, you can still express this in SQL:
select t.id, t.x, min(t.y)
from t join
(select id, MIN(x) as minx
from t
group by id
) tmin
on t.id = tmin.id and t.x = tmin.minx
group by t.id, t.x

If your RDBMS supports Window Function,
SELECT ID, X, Y
FROM
(
SELECT ID, X, Y,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY X, Y) rn
FROM tableName
) d
WHERE rn = 1
SQLFiddle Demo

Related

Postgres - Unique values for id column using CTE, Joins alongside GROUP BY

I have a table referrals:
id | user_id_owner | firstname | is_active | user_type | referred_at
----+---------------+-----------+-----------+-----------+-------------
3 | 2 | c | t | agent | 3
5 | 3 | e | f | customer | 5
4 | 1 | d | t | agent | 4
2 | 1 | b | f | agent | 2
1 | 1 | a | t | agent | 1
And another table activations
id | user_id_owner | referral_id | amount_earned | activated_at | app_id
----+---------------+-------------+---------------+--------------+--------
2 | 2 | 3 | 3.0 | 3 | a
4 | 1 | 1 | 6.0 | 5 | b
5 | 4 | 4 | 3.0 | 6 | c
1 | 1 | 2 | 2.0 | 2 | b
3 | 1 | 2 | 5.0 | 4 | b
6 | 1 | 2 | 7.0 | 8 | a
I am trying to generate another table from the two tables that has only unique values for referrals.id and returns as one of the columns the count for each apps as best_selling_app_count.
Here is the query I ran:
with agents
as
(select
referrals.id,
referral_id,
amount_earned,
referred_at,
activated_at,
activations.app_id
from referrals
left outer join activations
on (referrals.id = activations.referral_id)
where referrals.user_id_owner = 1),
distinct_referrals_by_id
as
(select
id,
count(referral_id) as activations_count,
sum(coalesce(amount_earned, 0)) as amount_earned,
referred_at,
max(activated_at) as last_activated_at
from
agents
group by id, referred_at),
distinct_referrals_by_app_id
as
(select id, app_id as best_selling_app,
count(app_id) as best_selling_app_count
from agents
group by id, app_id )
select *, dense_rank() over (order by best_selling_app_count desc) best_selling_app_rank
from distinct_referrals_by_id
inner join distinct_referrals_by_app_id
on (distinct_referrals_by_id.id = distinct_referrals_by_app_id.id);
Here is the result I got:
id | activations_count | amount_earned | referred_at | last_activated_at | id | best_selling_app | best_selling_app_count | best_selling_app_rank
----+-------------------+---------------+-------------+-------------------+----+------------------+------------------------+-----------------------
2 | 3 | 14.0 | 2 | 8 | 2 | b | 2 | 1
1 | 1 | 6.0 | 1 | 5 | 1 | b | 1 | 2
2 | 3 | 14.0 | 2 | 8 | 2 | a | 1 | 2
4 | 1 | 3.0 | 4 | 6 | 4 | c | 1 | 2
The problem with this result is that the table has a duplicate id of 2. I only need unique values for the id column.
I tried a workaround by harnessing distinct that gave desired result but I fear the query results may not be reliable and consistent.
Here is the workaround query:
with agents
as
(select
referrals.id,
referral_id,
amount_earned,
referred_at,
activated_at,
activations.app_id
from referrals
left outer join activations
on (referrals.id = activations.referral_id)
where referrals.user_id_owner = 1),
distinct_referrals_by_id
as
(select
id,
count(referral_id) as activations_count,
sum(coalesce(amount_earned, 0)) as amount_earned,
referred_at,
max(activated_at) as last_activated_at
from
agents
group by id, referred_at),
distinct_referrals_by_app_id
as
(select
distinct on(id), app_id as best_selling_app,
count(app_id) as best_selling_app_count
from agents
group by id, app_id
order by id, best_selling_app_count desc)
select *, dense_rank() over (order by best_selling_app_count desc) best_selling_app_rank
from distinct_referrals_by_id
inner join distinct_referrals_by_app_id
on (distinct_referrals_by_id.id = distinct_referrals_by_app_id.id);
I need a recommendation on how best to achieve this.
I am trying to generate another table from the two tables that has only unique values for referrals.id and returns as one of the columns the count for each apps as best_selling_app_count.
Your question is really complicated with a very complicated SQL query. However, the above is what looks like the actual question. If so, you can use:
select r.*,
a.app_id as most_common_app_id,
a.cnt as most_common_app_id_count
from referrals r left join
(select distinct on (a.referral_id) a.referral_id, a.app_id, count(*) as cnt
from activations a
group by a.referral_id, a.app_id
order by a.referral_id, count(*) desc
) a
on a.referral_id = r.id;
You have not explained the other columns that are in your result set.

Count the number of appearances of char given a ID

I have to perform a query where I can count the number of distinct codes per Id.
|Id | Code
------------
| 1 | C
| 1 | I
| 2 | I
| 2 | C
| 2 | D
| 2 | D
| 3 | C
| 3 | I
| 3 | D
| 4 | I
| 4 | C
| 4 | C
The output should be something like:
|Id | Count | #Code C | #Code I | #Code D
-------------------------------------------
| 1 | 2 | 1 | 1 | 0
| 2 | 3 | 1 | 0 | 2
| 3 | 3 | 1 | 1 | 1
| 4 | 2 | 2 | 1 | 0
Can you give me some advise on this?
This answers the original version of the question.
You are looking for count(distinct):
select id, count(distinct code)
from t
group by id;
If the codes are only to the provided ones, the following query can provide the desired result.
select
pvt.Id,
codes.total As [Count],
COALESCE(C, 0) AS [#Code C],
COALESCE(I, 0) AS [#Code I],
COALESCE(D, 0) AS [#Code D]
from
( select Id, Code, Count(code) cnt
from t
Group by Id, Code) s
PIVOT(MAX(cnt) FOR Code IN ([C], [I], [D])) pvt
join (select Id, count(distinct Code) total from t group by Id) codes on pvt.Id = codes.Id ;
Note: as I can see from sample input data, code 'I' is found in all of Ids. Its count is zero for Id = 3 in the expected output (in the question).
Here is the correct output:
DB Fiddle

How to select a random record for each group

I have a table like
| A | B | C | D |
|--------|---|---|---|
| Value1 | x | x | x |
| Value1 | y | x | y |
| Value1 | x | x | x |
| .... |
| Value2 | x | x | x |
| Value2 | x | x | x |
| Value2 | x | x | x |
| .... |
| Value3 | x | x | x |
| Value3 | x | x | x |
| Value3 | x | x | x |
where A column can have one value from a set. I want to get a random record for each unique value in A column.
You can use window functions:
select *
from (
select
t.*,
row_number() over(partition by a order by random()) rn
from mytable t
) t
where rn = 1
row_number() assigns a random rank to each record within groups having the same a; then, the outer query filters one record per group.
Actually, since you are running Postgres, you could as well use distinct on, which could give better performance (and a shorter syntax):
select distinct on (a) t.*
from mytable t
order by a, random();
You can do it with distinct on:
select distinct on (a) a, b, c, d
from test t;
Here is a Demo
With DISTINCT ON, You tell PostgreSQL to return a single row for each
distinct group defined by the ON clause.
More about that subject here: https://www.geekytidbits.com/postgres-distinct-on/

Efficient ROW_NUMBER increment when column matches value

I'm trying to find an efficient way to derive the column Expected below from only Id and State. What I want is for the number Expected to increase each time State is 0 (ordered by Id).
+----+-------+----------+
| Id | State | Expected |
+----+-------+----------+
| 1 | 0 | 1 |
| 2 | 1 | 1 |
| 3 | 0 | 2 |
| 4 | 1 | 2 |
| 5 | 4 | 2 |
| 6 | 2 | 2 |
| 7 | 3 | 2 |
| 8 | 0 | 3 |
| 9 | 5 | 3 |
| 10 | 3 | 3 |
| 11 | 1 | 3 |
+----+-------+----------+
I have managed to accomplish this with the following SQL, but the execution time is very poor when the data set is large:
WITH Groups AS
(
SELECT Id, ROW_NUMBER() OVER (ORDER BY Id) AS GroupId FROM tblState WHERE State=0
)
SELECT S.Id, S.[State], S.Expected, G.GroupId FROM tblState S
OUTER APPLY (SELECT TOP 1 GroupId FROM Groups WHERE Groups.Id <= S.Id ORDER BY Id DESC) G
Is there a simpler and more efficient way to produce this result? (In SQL Server 2012 or later)
Just use a cumulative sum:
select s.*,
sum(case when state = 0 then 1 else 0 end) over (order by id) as expected
from tblState s;
Other method uses subquery :
select *,
(select count(*)
from table t1
where t1.id < t.id and state = 0
) as expected
from table t;

SQL UPDATE only Duplicates

I have a SQL Table like this:
+------+------------+---------+---------+--------+
| id | x | y | z | status |
+------+------------+---------+---------+--------+
| 1 | bla | ja | 1 | 0 |
| 2 | blaa | jaa | 2 | 0 |
| 3 | bla | ja | 1 | 0 |
| 4 | blaaa | jaaa | 3 | 0 |
| 5 | blaa | jaa | 2 | 0 |
+------+------------+---------+---------+--------+
I want to UPDATE only the status column of the duplicate rows and not the first one.
With that statement i update every duplicate also the first row of a duplicate row:
UPDATE table INNER JOIN
(SELECT x, y, z FROM table GROUP BY x,y,z HAVING COUNT(id) > 1)
dup
ON table.x = dup.x && table.y = dup.y && table.z = dup.z
SET status = '1'
But thats no right because the table has to look after the UPDATE Statement like this:
+------+------------+---------+---------+--------+
| id | x | y | z | status |
+------+------------+---------+---------+--------+
| 1 | bla | ja | 1 | 0 |
| 2 | blaa | jaa | 2 | 0 |
| 3 | bla | ja | 1 | 1 |
| 4 | blaaa | jaaa | 3 | 0 |
| 5 | blaa | jaa | 2 | 1 |
+------+------------+---------+---------+--------+
I hope you can help me.
Thanks a lot.
Just play with a select statment like the one below until you have a list of the duplicates then update as shown.
UPDATE table set status = '1'
WHERE ID in (select id from(Select ROW_NUMBER() OVER (Partition By x,y,z,status) as dup,id) where dup>1)
Didn't say RDBMS so this is for SQL Server
I believe this is what you want:
UPDATE table t INNER JOIN
(SELECT x, y, z, MIN(id) as minid
FROM table
GROUP BY x, y, z
HAVING COUNT(id) > 1 -- not strictly necessary, but why not?
) dup
ON t.x = dup.x AND t.y = dup.y AND t.z = dup.z AND
t.id > dup.minid
SET status = 1;
This calculates the minimum id for each group and then updates all the other rows.