SUM(), GROUP BY, and WHERE - sql

I have a table like this:
Title | Version | Condition | Count
-----------------------------------------------
Title1 | 1.0 | 1 | 10
Title1 | 1.1 | 2 | 5
Title1 | 1.1 | 2 | 10
Title1 | 1.1 | 1 | 10
Title2 | 1.0 | 2 | 10
Title2 | 1.5 | 1 | 5
Title2 | 1.5 | 2 | 5
Title3 | 1.5 | 2 | 10
Title3 | 1.5 | 1 | 10
And I would like to sum the value of "Count" for each line that has the MAX() "Version", and "Condition" = 2. I'd like this to be the resulting data set:
Title | Version | Condition | Count
-----------------------------------------------
Title1 | 1.1 | 2 | 15
Title2 | 1.5 | 2 | 5
Title3 | 1.5 | 2 | 10
I am able to get the list of "Title" And MAX("Version") with "Condition" = 2 with:
SELECT DISTINCT Title, MAX(Version) AS MaxVer FROM TABLE
WHERE Condition = 2
GROUP BY Title
Bit I'm not sure how to add all the "Count"s.

Try this:
SELECT t1.Title, t1.Version, t1.Condition, SUM([Count]) AS Count
FROM mytable AS t1
JOIN (
SELECT Title, MAX(Version) AS max_version
FROM mytable
WHERE Condition = 2
GROUP BY Title
) AS t2 ON t1.Title = t2.Title AND t1.Version = t2.max_version
WHERE t1.Condition = 2
GROUP BY t1.Title, t1.Version, t1.Condition
Demo here

This needs only a single access to the table:
SELECT Title, Version, Condition, Count
FROM
( -- calculate all SUMs
SELECT Title, Version, Condition, SUM(Count) AS Count,
ROW_NUMBER() OVER (PARTITION BY Title ORDER BY Version DESC) AS rn
FROM TABLE
WHERE Condition = 2
GROUP BY Title, Version, Condition
) AS dt
-- and return only the row with the highest Version
WHERE rn = 1

Related

Postgres - Unique values for id column using CTE, Joins alongside GROUP BY

I have a table referrals:
id | user_id_owner | firstname | is_active | user_type | referred_at
----+---------------+-----------+-----------+-----------+-------------
3 | 2 | c | t | agent | 3
5 | 3 | e | f | customer | 5
4 | 1 | d | t | agent | 4
2 | 1 | b | f | agent | 2
1 | 1 | a | t | agent | 1
And another table activations
id | user_id_owner | referral_id | amount_earned | activated_at | app_id
----+---------------+-------------+---------------+--------------+--------
2 | 2 | 3 | 3.0 | 3 | a
4 | 1 | 1 | 6.0 | 5 | b
5 | 4 | 4 | 3.0 | 6 | c
1 | 1 | 2 | 2.0 | 2 | b
3 | 1 | 2 | 5.0 | 4 | b
6 | 1 | 2 | 7.0 | 8 | a
I am trying to generate another table from the two tables that has only unique values for referrals.id and returns as one of the columns the count for each apps as best_selling_app_count.
Here is the query I ran:
with agents
as
(select
referrals.id,
referral_id,
amount_earned,
referred_at,
activated_at,
activations.app_id
from referrals
left outer join activations
on (referrals.id = activations.referral_id)
where referrals.user_id_owner = 1),
distinct_referrals_by_id
as
(select
id,
count(referral_id) as activations_count,
sum(coalesce(amount_earned, 0)) as amount_earned,
referred_at,
max(activated_at) as last_activated_at
from
agents
group by id, referred_at),
distinct_referrals_by_app_id
as
(select id, app_id as best_selling_app,
count(app_id) as best_selling_app_count
from agents
group by id, app_id )
select *, dense_rank() over (order by best_selling_app_count desc) best_selling_app_rank
from distinct_referrals_by_id
inner join distinct_referrals_by_app_id
on (distinct_referrals_by_id.id = distinct_referrals_by_app_id.id);
Here is the result I got:
id | activations_count | amount_earned | referred_at | last_activated_at | id | best_selling_app | best_selling_app_count | best_selling_app_rank
----+-------------------+---------------+-------------+-------------------+----+------------------+------------------------+-----------------------
2 | 3 | 14.0 | 2 | 8 | 2 | b | 2 | 1
1 | 1 | 6.0 | 1 | 5 | 1 | b | 1 | 2
2 | 3 | 14.0 | 2 | 8 | 2 | a | 1 | 2
4 | 1 | 3.0 | 4 | 6 | 4 | c | 1 | 2
The problem with this result is that the table has a duplicate id of 2. I only need unique values for the id column.
I tried a workaround by harnessing distinct that gave desired result but I fear the query results may not be reliable and consistent.
Here is the workaround query:
with agents
as
(select
referrals.id,
referral_id,
amount_earned,
referred_at,
activated_at,
activations.app_id
from referrals
left outer join activations
on (referrals.id = activations.referral_id)
where referrals.user_id_owner = 1),
distinct_referrals_by_id
as
(select
id,
count(referral_id) as activations_count,
sum(coalesce(amount_earned, 0)) as amount_earned,
referred_at,
max(activated_at) as last_activated_at
from
agents
group by id, referred_at),
distinct_referrals_by_app_id
as
(select
distinct on(id), app_id as best_selling_app,
count(app_id) as best_selling_app_count
from agents
group by id, app_id
order by id, best_selling_app_count desc)
select *, dense_rank() over (order by best_selling_app_count desc) best_selling_app_rank
from distinct_referrals_by_id
inner join distinct_referrals_by_app_id
on (distinct_referrals_by_id.id = distinct_referrals_by_app_id.id);
I need a recommendation on how best to achieve this.
I am trying to generate another table from the two tables that has only unique values for referrals.id and returns as one of the columns the count for each apps as best_selling_app_count.
Your question is really complicated with a very complicated SQL query. However, the above is what looks like the actual question. If so, you can use:
select r.*,
a.app_id as most_common_app_id,
a.cnt as most_common_app_id_count
from referrals r left join
(select distinct on (a.referral_id) a.referral_id, a.app_id, count(*) as cnt
from activations a
group by a.referral_id, a.app_id
order by a.referral_id, count(*) desc
) a
on a.referral_id = r.id;
You have not explained the other columns that are in your result set.

SQL select distinct when one column in and another column greater than

Consider the following dataset:
+---------------------+
| ID | NAME | VALUE |
+---------------------+
| 1 | a | 0.2 |
| 1 | b | 8 |
| 1 | c | 3.5 |
| 1 | d | 2.2 |
| 2 | b | 4 |
| 2 | c | 0.5 |
| 2 | d | 6 |
| 3 | a | 2 |
| 3 | b | 4 |
| 3 | c | 3.6 |
| 3 | d | 0.2 |
+---------------------+
I'm tying to develop a sql select statement that returns the top or distinct ID where NAME 'a' and 'b' both exist and both of the corresponding VALUE's are >= '1'. Thus, the desired output would be:
+---------------------+
| ID | NAME | VALUE |
+---------------------+
| 3 | a | 2 |
+----+-------+--------+
Appreciate any assistance anyone can provide.
You can try to use MIN window function and some condition to make it.
SELECT * FROM (
SELECT *,
MIN(CASE WHEN NAME = 'a' THEN [value] end) OVER(PARTITION BY ID) aVal,
MIN(CASE WHEN NAME = 'b' THEN [value] end) OVER(PARTITION BY ID) bVal
FROM T
) t1
WHERE aVal >1 and bVal >1 and aVal = [Value]
sqlfiddle
This seems like a group by and having query:
select id
from t
where name in ('a', 'b')
having count(*) = 2 and
min(value) >= 1;
No subqueries or joins are necessary.
The where clause filters the data to only look at the "a" and "b" records. The count(*) = 2 checks that both exist. If you can have duplicates, then use count(distinct name) = 2.
Then, you want the minimum value to be 1, so that is the final condition.
I am not sure why your desired results have the "a" row, but if you really want it, you can change the select to:
select id, 'a' as name,
max(case when name = 'a' then value end) as value
you can use in and sub-query
select top 1 * from t
where t.id in
(
select id from t
where name in ('a','b')
group by id
having sum(case when value>1 then 1 else 0)>=2
)
order by id

How to increment the counting for each non-consecutive value?

Below is a simple representation of my table:
ID | GA
----------
1 | 1.5
2 | 1.5
3 | 1.2
4 | 1.5
5 | 1.3
I would like to count the number of occurrence of the GA column's values BUT the count should not increment when the value is the same as the next row.
What I would like to expect is like this:
ID | GA | COUNT
-------------------
1 | 1.5 | 1
2 | 1.5 | 1
3 | 1.2 | 1
4 | 1.5 | 2
5 | 1.3 | 1
Notice that GA = 1.5 count is 2. This is because there is a row between ID 2 & 4 that breaks the succession of 1.5.
NOTE: The ordering by ID also matters.
Here's what I've done so far:
SELECT ID,GA,COUNT (*) OVER (
PARTITION BY GA
ORDER BY ID
) COUNT
FROM (
SELECT 1 AS ID,'1.5' AS GA
FROM DUAL
UNION
SELECT 2,'1.5' FROM DUAL
UNION
SELECT 3,'1.2' FROM DUAL
UNION
SELECT 4,'1.5' FROM DUAL
UNION
SELECT 5,'1.3' FROM DUAL
) FOO
ORDER BY ID;
But the result is far from expectation:
ID | GA | COUNT
-------------------
1 | 1.5 | 1
2 | 1.5 | 2
3 | 1.2 | 1
4 | 1.5 | 3
5 | 1.3 | 1
Notice that even if they are consecutive values, the count is still incrementing.
It seems, that you are asking for a kind of a running total, not just a global count.
Assuming, that the input data is in a table named input_data, this should do the trick:
WITH
with_previous AS (
SELECT id, ga, LAG(ga) OVER (ORDER BY id) AS previous_ga
FROM input_data
),
just_new AS (
SELECT id,
ga,
CASE
WHEN previous_ga IS NULL
OR previous_ga <> ga
THEN ga
END AS new_ga
FROM with_previous
)
SELECT id,
ga,
COUNT(new_ga) OVER (PARTITION BY ga ORDER BY id) AS ga_count
FROM just_new
ORDER BY 1
See sqlfiddle: http://sqlfiddle.com/#!4/187e13/1
Result:
ID | GA | GA_COUNT
----+-----+----------
1 | 1.5 | 1
2 | 1.5 | 1
3 | 1.2 | 1
4 | 1.5 | 2
5 | 1.3 | 1
6 | 1.5 | 3
7 | 1.5 | 3
8 | 1.3 | 2
I took sample data from #D-Shih's sqlfiddle
As I understand the problem, this is a variation of a gaps-and-islands problem. You want to enumerate the groups for each ga value independently.
If this interpretation is correct, then I would go for dense_rank() and the difference of row numbers:
select t.*, dense_rank() over (partition by ga order by seqnum_1 - seqnum_2)
from (select t.*,
row_number() over (order by id) as seqnum_1,
row_number() over (partition by ga order by id) as seqnum_2
from t
) t
order by id;
Here is a rextester.
Use a subquery with LAG and SUM anlytic functions:
SELECT id, ga,
sum( cnt ) over (partition by ga order by id) as cnt
FROM (
select t.*,
case lag(ga) over (order by id)
when ga then 0 else 1
end cnt
from Tab t
)
order by id
| ID | GA | CNT |
|----|-----|-----|
| 1 | 1.5 | 1 |
| 2 | 1.5 | 1 |
| 3 | 1.2 | 1 |
| 4 | 1.5 | 2 |
| 5 | 1.3 | 1 |
Demo: http://sqlfiddle.com/#!4/5ddd1/5

Select the latest message thread values from a table using sql

This is my table
Id | ReceiverId | SenderId | Text
-----------+---------------+--------------+-----------
1 | 5 | 1 | text
2 | 5 | 1 | text
3 | 1 | 5 | text
4 | 2 | 5 | text
5 | 2 | 5 | text
6 | 5 | 3 | text
7 | 5 | 4 | text
9 | 5 | 6 | text
10 | 5 | 4 | text
11 | 10 | 5 | text
12 | 5 | 10 | text
13 | 10 | 5 | text
14 | 5 | 10 | text
How do I select a row with out duplication based on [ReceiverId, SenderId] pair and Ordered by Id in Descending order. That is: [5, 1]=[1,5] are duplicate. OR [5,1] = [5,1] are also the duplicate.
So the final result should be:
Id | ReceiverId | SenderId | Text
-----------+---------------+--------------+-----------
14 | 5 | 10 | text
10 | 5 | 4 | text
9 | 5 | 6 | text
6 | 5 | 3 | text
5 | 2 | 5 | text
3 | 1 | 5 | text
Assuming that among records, which you consider to be the same by just checking the SenderId and ReceiverId (order doesn't matter), you want the one with the largest Id (which could probably be the latest). Then, this query will give you the result:
select Id, ReceiverId, SenderId, [Text]
from MyTable t
where t.Id in (
select top 1 tt.Id
from MyTable tt
where (tt.SenderId = t.SenderId and tt.ReceiverId = t.ReceiverId) or
(tt.SenderId = t.ReceiverId and tt.ReceiverId = t.SenderId)
order by tt.Id desc
)
order by t.Id desc
Replace MyTable with your table's name.
select b.ID,
a.senderid_final,
a.receiverid_final,
b.Text
from
(
select a.receiverid as a_receiverid,
a.senderid as a_senderid ,
b.receiverid as b_receiverid,
b.senderid as b_senderid,
case when max(a.id) > max (b.id) then a.receiverid else b.receiverid end as receiverid_final,
case when max(a.id) > max (b.id) then a.senderid else b.senderid end as senderid_final
from my_tables as a
inner join my_table as b
on a.receiverid = b.senderid
and b.receiverid = a.senderid
group by a.receiverid, a.senderid, b.receiverid, b.senderid
) as a
inner join my_tables as b
on a.receiverid_final = b.receiverid
and b.senderid = a.senderid_final
Order by b.id desc

distinct rows with group by

I have one table:
id_object | version | document
------------------------------
1 | 1 | 1
1 | 2 | 2
2 | 1 | 3
2 | 2 | 1
2 | 3 | 2
1 | 1 | 3
I want to show only one row by object with the version (max) and the document. I have tried the following"
Select Distinct
id_object ,
Max(version),
document
From
prods
Group By
id_object, document
and I get this result
1 | 1 | 1
1 | 2 | 2
2 | 1 | 3
2 | 2 | 1
2 | 3 | 2
1 | 1 | 3
As you can see, I'm getting the entire table. My question is, why?
Since you group by id_object and document, you won't get your desired result. That is because document is different for each version.
select x.id_object,
x.maxversion as version,
p.document
from
(
Select id_object, Max(version) as maxversion
From prods
Group By id_object
) x
inner join prods p on p.id_object = x.id_object
and p.version = x.maxversion
You first have to select the id_object with the max(version). That can be joined with the actual data to get the correct document.
You have to do that because you can't select columns that are not in your group by clause, except you use a aggregate function on them (like max() for instance).
(MySQL can select non aggregated columns, but please avoid that since the outcome is not always clear or even predictable)
select prods.id_object, version, document
from prods inner join
(select id_object, max(version) as ver
from prods
group by id_object) tmp on prods.id_object = tmp.id_object and prods.version = tmp.ver
Query:
SQLFIDDLEExample
SELECT p.id_object,
p.version,
p.document
FROM prods p
WHERE p.version = (SELECT Max(version)
FROM prods
WHERE id_object = p.id_object)
Result:
| ID_OBJECT | VERSION | DOCUMENT |
----------------------------------
| 1 | 2 | 2 |
| 2 | 3 | 2 |