distinct rows with group by - sql

I have one table:
id_object | version | document
------------------------------
1 | 1 | 1
1 | 2 | 2
2 | 1 | 3
2 | 2 | 1
2 | 3 | 2
1 | 1 | 3
I want to show only one row by object with the version (max) and the document. I have tried the following"
Select Distinct
id_object ,
Max(version),
document
From
prods
Group By
id_object, document
and I get this result
1 | 1 | 1
1 | 2 | 2
2 | 1 | 3
2 | 2 | 1
2 | 3 | 2
1 | 1 | 3
As you can see, I'm getting the entire table. My question is, why?

Since you group by id_object and document, you won't get your desired result. That is because document is different for each version.
select x.id_object,
x.maxversion as version,
p.document
from
(
Select id_object, Max(version) as maxversion
From prods
Group By id_object
) x
inner join prods p on p.id_object = x.id_object
and p.version = x.maxversion
You first have to select the id_object with the max(version). That can be joined with the actual data to get the correct document.
You have to do that because you can't select columns that are not in your group by clause, except you use a aggregate function on them (like max() for instance).
(MySQL can select non aggregated columns, but please avoid that since the outcome is not always clear or even predictable)

select prods.id_object, version, document
from prods inner join
(select id_object, max(version) as ver
from prods
group by id_object) tmp on prods.id_object = tmp.id_object and prods.version = tmp.ver

Query:
SQLFIDDLEExample
SELECT p.id_object,
p.version,
p.document
FROM prods p
WHERE p.version = (SELECT Max(version)
FROM prods
WHERE id_object = p.id_object)
Result:
| ID_OBJECT | VERSION | DOCUMENT |
----------------------------------
| 1 | 2 | 2 |
| 2 | 3 | 2 |

Related

Postgres - Unique values for id column using CTE, Joins alongside GROUP BY

I have a table referrals:
id | user_id_owner | firstname | is_active | user_type | referred_at
----+---------------+-----------+-----------+-----------+-------------
3 | 2 | c | t | agent | 3
5 | 3 | e | f | customer | 5
4 | 1 | d | t | agent | 4
2 | 1 | b | f | agent | 2
1 | 1 | a | t | agent | 1
And another table activations
id | user_id_owner | referral_id | amount_earned | activated_at | app_id
----+---------------+-------------+---------------+--------------+--------
2 | 2 | 3 | 3.0 | 3 | a
4 | 1 | 1 | 6.0 | 5 | b
5 | 4 | 4 | 3.0 | 6 | c
1 | 1 | 2 | 2.0 | 2 | b
3 | 1 | 2 | 5.0 | 4 | b
6 | 1 | 2 | 7.0 | 8 | a
I am trying to generate another table from the two tables that has only unique values for referrals.id and returns as one of the columns the count for each apps as best_selling_app_count.
Here is the query I ran:
with agents
as
(select
referrals.id,
referral_id,
amount_earned,
referred_at,
activated_at,
activations.app_id
from referrals
left outer join activations
on (referrals.id = activations.referral_id)
where referrals.user_id_owner = 1),
distinct_referrals_by_id
as
(select
id,
count(referral_id) as activations_count,
sum(coalesce(amount_earned, 0)) as amount_earned,
referred_at,
max(activated_at) as last_activated_at
from
agents
group by id, referred_at),
distinct_referrals_by_app_id
as
(select id, app_id as best_selling_app,
count(app_id) as best_selling_app_count
from agents
group by id, app_id )
select *, dense_rank() over (order by best_selling_app_count desc) best_selling_app_rank
from distinct_referrals_by_id
inner join distinct_referrals_by_app_id
on (distinct_referrals_by_id.id = distinct_referrals_by_app_id.id);
Here is the result I got:
id | activations_count | amount_earned | referred_at | last_activated_at | id | best_selling_app | best_selling_app_count | best_selling_app_rank
----+-------------------+---------------+-------------+-------------------+----+------------------+------------------------+-----------------------
2 | 3 | 14.0 | 2 | 8 | 2 | b | 2 | 1
1 | 1 | 6.0 | 1 | 5 | 1 | b | 1 | 2
2 | 3 | 14.0 | 2 | 8 | 2 | a | 1 | 2
4 | 1 | 3.0 | 4 | 6 | 4 | c | 1 | 2
The problem with this result is that the table has a duplicate id of 2. I only need unique values for the id column.
I tried a workaround by harnessing distinct that gave desired result but I fear the query results may not be reliable and consistent.
Here is the workaround query:
with agents
as
(select
referrals.id,
referral_id,
amount_earned,
referred_at,
activated_at,
activations.app_id
from referrals
left outer join activations
on (referrals.id = activations.referral_id)
where referrals.user_id_owner = 1),
distinct_referrals_by_id
as
(select
id,
count(referral_id) as activations_count,
sum(coalesce(amount_earned, 0)) as amount_earned,
referred_at,
max(activated_at) as last_activated_at
from
agents
group by id, referred_at),
distinct_referrals_by_app_id
as
(select
distinct on(id), app_id as best_selling_app,
count(app_id) as best_selling_app_count
from agents
group by id, app_id
order by id, best_selling_app_count desc)
select *, dense_rank() over (order by best_selling_app_count desc) best_selling_app_rank
from distinct_referrals_by_id
inner join distinct_referrals_by_app_id
on (distinct_referrals_by_id.id = distinct_referrals_by_app_id.id);
I need a recommendation on how best to achieve this.
I am trying to generate another table from the two tables that has only unique values for referrals.id and returns as one of the columns the count for each apps as best_selling_app_count.
Your question is really complicated with a very complicated SQL query. However, the above is what looks like the actual question. If so, you can use:
select r.*,
a.app_id as most_common_app_id,
a.cnt as most_common_app_id_count
from referrals r left join
(select distinct on (a.referral_id) a.referral_id, a.app_id, count(*) as cnt
from activations a
group by a.referral_id, a.app_id
order by a.referral_id, count(*) desc
) a
on a.referral_id = r.id;
You have not explained the other columns that are in your result set.

How to get count from one table which is mutually dependent to another table

I have two table
Let's name as first table: QC_Meeting_Master
Second table: QC_Project_Master I want to calculate count of problems_ID Which is mutually depend on second table
ID | QC_ID | Problems_ID |
___|_______|_____________|
1 | 1 | 2 |
2 | 1 | 7 |
ID | QC_ID | Problem_ID |
___|_______|_____________|
1 | 1 | 7 |
2 | 1 | 7 |
3 | 1 | 7 |
4 | 1 | 7 |
5 | 1 | 2 |
6 | 1 | 2 |
7 | 1 | 2 |
select COUNT(Problem_ID) from [QC_Project_Master] where Problem_ID in
(select Problems_ID from QC_Meeting_Master QMM join QC_Project_Master QPM on QMM.Problems_ID = QPM.Problem_ID)
I have to calculate Count of QC_Project_Master (problem_ID) on basis of QC_Meeting_Master (Problems_ID)
it means for first table: QC_Meeting_Master(Problems_ID) = 2,
then count should be 3
And for Second table: QC_Project_Master (Problems_ID) = 7,
then count should be 4
use conditional aggregation
select sum(case when t2.Problem_ID=2 then 1 else 0 end),
sum(case when t2.Problem_ID=7 then 1 else 0 end) from
table1 t1 join table2 t2 on t1.QC_ID=t2.QC_ID and t1.Problems_ID=t2.Problems_ID
if you need all the group count then use below
select t2.QC_ID,t2.Problems_ID, count(*) from
table1 t1 join table2 t2
on t1.QC_ID=t2.QC_ID and t1.Problems_ID=t2.Problems_ID
group by t2.QC_ID,t2.Problems_ID
As far as I understood your problem this is simple aggregation and JOIN as below:
SELECT mm.QC_ID, mm.Problem_ID, pm.cnt
FROM QC_Meeting_Master mm
INNER JOIN
(
SELECT QC_ID, Problem_ID, COUNT(*) cnt
FROM QC_Project_Master
GROUP BY QC_ID, Problem_ID
) pm
ON pm.QC_ID = mm.QC_ID AND pm.Problem_ID = mm.Problem_ID;

Find duplicate combinations

I need a query to find duplicate combinations in these tables:
AttributeValue:
id | name
------------------
1 | green
2 | blue
3 | red
4 | 100x200
5 | 150x200
Product:
id | name
----------------
1 | Produkt A
ProductAttribute:
id | id_product | price
--------------------------
1 | 1 | 100
2 | 1 | 200
3 | 1 | 100
4 | 1 | 200
5 | 1 | 100
6 | 1 | 200
7 | 1 | 100 -- duplicate combination
8 | 1 | 100 -- duplicate combination
ProductAttributeCombinations:
id_product_attribute | id_attribute
-------------------------------------
1 | 1
1 | 4
2 | 1
2 | 5
3 | 2
3 | 4
4 | 2
4 | 5
5 | 3
5 | 4
6 | 3
6 | 5
7 | 1
7 | 4
8 | 1
8 | 5
I need SQL that creates result like:
id_product | duplicate_attributes
----------------------------------
1 | {7,8}
If I understand correct, 7 is a duplicate of 1 and 8 is a duplicate of 2. As phrased, your question is a bit confusing, because 7 and 8 are not related to each other and the only table of interest is ProductAttributeCombinations.
If this is the case, then one method is to use string aggregation
with combos as (
select id_product_attribute,
string_agg(id_attribute::text, ',' order by id_attribute) as combo
from ProductAttributeCombinations pac
group by id_product_attribute
)
select *
from combos c
where exists (select 1
from combos c2
where c2.id_product_attribute > c.id_product_attribute and
c2.combo = c.combo
);
Your question leaves some room for interpretation. Here is my educated guess:
For each product, return an array of all instances with the same set of attributes as any other instance of the same product with smaller ID.
WITH combo AS (
SELECT id_product, id, array_agg(id_attribute) AS attributes
FROM (
SELECT pa.id_product, pa.id, pac.id_attribute
FROM ProductAttribute pa
JOIN PoductAttributeCombinations pac ON pac.id_product_attribute = pa.id
ORDER BY pa.id_product, pa.id, pac.id_attribute
) sub
GROUP BY 1, 2
)
SELECT id_product, array_agg(id) AS duplicate_attributes
FROM combo c
WHERE EXISTS (
SELECT 1
FROM combo
WHERE id_product = c.id_product
AND attributes = c.attributes
AND id < c.id
)
GROUP BY 1;
Sorting can be inlined into the aggregate function so we don't need a subquery for the sort (like #Gordon already provided). This is shorter, but also typically slower:
WITH combo AS (
SELECT pa.id_product, pa.id
, array_agg(pac.id_attribute ORDER BY pac.id_attribute) AS attributes
FROM ProductAttribute pa
JOIN PoductAttributeCombinations pac ON pac.id_product_attribute = pa.id
GROUP BY 1, 2
)
SELECT ...
This only returns products with duplicate instances.
SQL Fiddle.
Your table names are rather misleading / contradict the rest of your question. Your sample data is not very clear either, only featuring a single product. I assume there are many in your table.
It's also unclear whether you are using double-quoted table names preserving CaMeL-case spelling. I assume: no.

Why isn't this returning unique combinations of these attributes?

When using the following query:
with neededSkills(SkillCode) as (
select distinct SkillCode
from job natural join hasprofile natural join requires_skill
where job_code = '1'
minus
select skillcode
from person natural join hasskill
where id = '1'
)
select distinct
taughtin.c_code as c,
count(taughtin.skillcode) as s,
ti.c_code as cc,
count(ti.skillcode) as ss
from taughtin, taughtin ti
where taughtin.c_code <> ti.c_code
and taughtin.skillcode <> ti.skillcode
and taughtin.skillcode in (select skillcode from neededskills)
and ti.skillcode in (select skillcode from neededskills)
group by (taughtin.c_code, ti.c_code)
order by (taughtin.c_code);
It returns:
C | S | CC | SS
----|----|----|----
1 | 1 | 2 | 1
1 | 1 | 3 | 1
1 | 1 | 5 | 1
2 | 1 | 1 | 1
3 | 1 | 1 | 1
5 | 1 | 1 | 1
I would expect it to return only lines where the combination of C and CC was not already used. Do I misunderstand how group by works? How would I achieve this result?
I am trying to have it return:
C | S | CC | SS
----|----|----|----
1 | 1 | 2 | 1
1 | 1 | 3 | 1
1 | 1 | 5 | 1
I use Oracle SQLPlus.
You're grouping on the combination of taughtin.c_code and ti.c_code, which are seperate columns in the context of the query (even though they are the same column in the schema). A pair of 1, 2 is not the same as a pair of 2, 1; the values may be the same but the sources are not.
If you want to get the combinations one way but not the other then the simplest thing is to always make one value large than the other; instead of:
where taughtin.c_code <> ti.c_code
use:
where ti.c_code > taughtin.c_code
Though it would be better to use ANSI joins for the main query too, and I'm not a fan of natural joins. You also don't need either distinct; the first may eliminate duplicates but they don't logically matter if you're only using the temporary result set for in()

Find matching set of records

I have the following tables on SQL Server 2008R2
MessageTable
ContrlNo| LineNo | Msg
1 | 1 | Tiger1 Text
1 | 2 | Tiger1 Text
1 | 3 | Tiger1 Text
1 | 4 | Tiger1 Text
2 | 1 | Tiger1 Text1
2 | 2 | Tiger1 Text2
2 | 3 | Tiger1 Text3
2 | 4 | Tiger1 Text4
3 | 1 | Horse 1
3 | 2 | Horse 2
3 | 3 | Horse 3
3 | 4 | Horse 4
RuleTable
RuleNo| MsgLineNo | RuleStartingPos | RuleMsg
1 | 1 | 1 | Tiger1 Text
2 | 1 | 1 | Tiger1 Text
2 | 3 | 1 | Tiger1 Text3
For each set of ControlNo records in the MESSAGETABLE I would like to apply the rule from the RULETABLE and list the RULENo if any mataches.
If you see the RuleTable, the Rule 2 overlaps rule 1. And the requirement is to get the most matched RuleNo for each Control number. the expected result is,
ContrlNo | RuleNo
1 | 1
2 | 2
3 | NULL
Thanks,
Jay
I believe the following will retrieve the listed results: it will show a control number with the associated rule that has the most matching lines (if, as in your first case, two rules have an equal number of matches, it will check the MatchPercent).
SELECT MT.ContrlNo, r.RuleNo, r.MatchPercent
FROM
MessageTable MT
LEFT JOIN
(
SELECT
ContrlNo,
RuleNo,
MatchedRules / AvailableRules AS MatchPercent,
ROW_NUMBER() OVER (PARTITION BY ContrlNo ORDER BY MatchedRules DESC, MatchedRules / AvailableRules DESC) AS rn
FROM
(
SELECT
ContrlNo,
R.RuleNo,
COUNT(*) as MatchedRules,
(SELECT COUNT(*) FROM RuleTable WHERE RuleTable.RuleNo = R.RuleNo) + 0.0 AS AvailableRules
FROM
MessageTable M
INNER JOIN
RuleTable R ON
M.[LineNo] = R.MsgLineNo AND
SUBSTRING(M.Msg,R.RuleStartingPos,LEN(R.RuleMsg)) LIKE '%' + R.RuleMsg + '%'
GROUP BY M.ContrlNo, R.RuleNo
) q
) r ON
MT.ContrlNo = r.ContrlNo AND
r.rn = 1
GROUP BY MT.ContrlNo, r.RuleNo, r.MatchPercent
SQL Fiddle
First subquery is getting total Matched rules and second sub query is getting total rules, when these two conditions are matched then we show the RuleNo other wise NULL as we are using LEFT JOIN
Select A.ContrlNo, ISNULL(T.RuleNo,0) as RuleNo FROM
( select ContrlNo, COUNT(R.RuleNo) as MatchedRules
FROM Messages M
LEFT JOIN Rules R
on M.[LineNo] = R.MsgLineNo
and SUBSTRING(M.Msg,R.RuleStartingPos,LEN(R.RuleMsg)) = R.RuleMsg
AND M.ContrlNo = R.RuleNo
GROUP BY M.ContrlNo) A
LEFT JOIN (
select COUNT(MsgLineNo) as TotalRules, RuleNo
from Rules R1
group by RuleNo) T
ON A.MatchedRules = T.TotalRules