Order rows based on aggregated data - sql

I have a sample table like shown below :
select * from sampleTable;
label | data
-------+------
a | 1
b | 2
c | 3
d | 4
a | 5
b | 6
(6 rows)
I require rows to be sorted with the summed up values of 'data' column (i.e) c with data of 3 should come first and b with combined data of 2 and 6 should come last and others in-between like shown below
label | data
-------+------
c | 3
d | 4
a | 1
a | 5
b | 2
b | 6
I have tried to achieve this with a self join as shown below. But it seems a bit verbose. Am I doing it right or is there a better way to achieve the same without joins?
select l, data from sampleTable join (select label as l, sum(data) as x from sampleTable group by l) m on label = m.l order by x;
l | data
---+------
c | 3
d | 4
a | 1
a | 5
b | 2
b | 6
(6 rows)

You can avoid the self-join by using a SUM with a windowed function, something like this:
SELECT label
, data
FROM (
SELECT *
, SUM(data) OVER (PARTITION BY label) pts
FROM sampleTable
) AS rez
ORDER BY pts

You don't need a self-join or a subquery. You can use window functions in the order by:
select t.*
from t
order by sum(data over (partition by label),
label;
Note the inclusion of label as the second key. This is important for distinguishing ties in the data. It ensures that the all rows for a given label all appear together.

Simply use the sum window function in ORDER BY
SELECT l, d
FROM tab
ORDER BY SUM(d) OVER (PARTITION BY l)
dbfiddle demo

Related

Find the count of IDs that have the same value

I'd like to get a count of all of the Ids that have have the same value (Drops) as other Ids. For instance, the illustration below shows you that ID 1 and 3 have A drops so the query would count them. Similarly, ID 7 & 18 have B drops so that's another two IDs that the query would count totalling in 4 Ids that share the same values so that's what my query would return.
+------+-------+
| ID | Drops |
+------+-------+
| 1 | A |
| 2 | C |
| 3 | A |
| 7 | B |
| 18 | B |
+------+-------+
I've tried the several approaches but the following query was my last attempt.
With cte1 (Id1, D1) as
(
select Id, Drops
from Posts
),
cte2 (Id2, D2) as
(
select Id, Drops
from Posts
)
Select count(distinct c1.Id1) newcnt, c1.D1
from cte1 c1
left outer join cte2 c2 on c1.D1 = c2.D2
group by c1.D1
The result if written out in full would be a single value output but the records that the query should be choosing should look as follows:
+------+-------+
| ID | Drops |
+------+-------+
| 1 | A |
| 3 | A |
| 7 | B |
| 18 | B |
+------+-------+
Any advice would be great. Thanks
You can use a CTE to generate a list of Drops values that have more than one corresponding ID value, and then JOIN that to Posts to find all rows which have a Drops value that has more than one Post:
WITH CTE AS (
SELECT Drops
FROM Posts
GROUP BY Drops
HAVING COUNT(*) > 1
)
SELECT P.*
FROM Posts P
JOIN CTE ON P.Drops = CTE.Drops
Output:
ID Drops
1 A
3 A
7 B
18 B
If desired you can then count those posts in total (or grouped by Drops value):
WITH CTE AS (
SELECT Drops
FROM Posts
GROUP BY Drops
HAVING COUNT(*) > 1
)
SELECT COUNT(*) AS newcnt
FROM Posts P
JOIN CTE ON P.Drops = CTE.Drops
Output
newcnt
4
Demo on SQLFiddle
You may use dense_rank() to resolve your problem. if drops has the same ID then dense_rank() will provide the same rank.
Here is the demo.
with cte as
(
select
drops,
count(distinct rnk) as newCnt
from
( select
*,
dense_rank() over (partition by drops order by id) as rnk
from myTable
) t
group by
drops
having count(distinct rnk) > 1
)
select
sum(newCnt) as newCnt
from cte
Output:
|newcnt |
|------ |
| 4 |
First group the count of the ids for your drops and then sum the values greater than 1.
select sum(countdrops) as total from
(select drops , count(id) as countdrops from yourtable group by drops) as temp
where countdrops > 1;

pulling data from max field

I have a table structure with columns similar to the following:
ID | line | value
1 | 1 | 10
1 | 2 | 5
2 | 1 | 6
3 | 1 | 7
3 | 2 | 4
ideally, i'd like to pull the following:
ID | value
1 | 5
2 | 6
3 | 4
one solution would be to do something like the following:
select a.ID, a.value
from
myTable a
inner join (select id, max(line) as line from myTable group by id) b
on a.id = b.id and a.line = b.line
Given the size of the table and that this is just a part of a larger pull, I'd like to see if there's a more elegant / simpler way of pulling this directly.
This is a task for OLAP-functions:
select *
from myTable a
qualify
rank() -- assign a rank for each id
over (partition by id
order by line desc) = 1
Might return multiple rows per id if they share the same max line. If you want to return only one of them, add another column to the order by to make it unique or switch to row_number to get an indeterminate row.

I want to group one columns based on a condition on another column

Suppose I have a table like this :
Column A | Column B
---------+---------
1 | A
1 | B
2 | A
2 | A
2 | C
3 | A
3 | A
3 | B
3 | B
I want to write a query that groups the values in such a way that i get a table like this :
Column A | Column B
---------+---------
1 | A
1 | B
2 | A
2 | C
3 | A
3 | B
You are looking for DISTINCT. DISTINCT removes duplicates from your query result.
select distinct * from mytable;
An alternative would be aggregation. You'd get a result row per a and b by grouping by them. You'd only use this however, when you want aggregates, e.g. the number of rows, a sum, an average, etc., because otherwise you can use DISTINCT as shown and should prefer it.
select a, b, count(*) from mytable group by a, b;
You need to use GROUP BY on both column.
select col1, col2 from test group by col1,col2;
See SQLFIDDLE

Remove partial duplicates sql server

I am altering an existing view within SQL Server. My union statement creates something along the lines of:
Col1 | C2 | C3 | C4
-----|----|------|-----
1 A | B | NULL | NULL
2 A | B | C | NULL
3 A | B | C | D
4 E | F | NULL | NULL
5 E | F | G | NULL
However, I only want (in this scenario) rows 3 and 5 (I need to ommit one and two because they contain duplicate info - columns one, two, and three contain the same info as row three, but the third row is the most 'complete'). Row 5 for the same reason vs row 4.
Is this an outer join / intersect issue? How the heck do you create a view in this manner?
Assuming that Col1 is not NULL, then we can use ROW_NUMBER with order by on all 4 columns total value
; with cte
AS
(
select ROW_NUMBER() over ( partition by col1 order by (coalesce(Col1,'')+
coalesce([C2],'') +
coalesce([C3],'') +
coalesce([C4],'') ) desc) as seq,
*
FROM Table1
)
select * from cte
where seq =1

Trying to select multiple columns where one is unique

I am trying to select several columns from a table where one of the columns is unique. The select statement looks something like this:
select a, distinct b, c, d
from mytable
The table looks something like this:
| a | b | c | d | e |...
|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5
| 1 | 2 | 3 | 4 | 6
| 2 | 5 | 7 | 1 | 9
| 7 | 3 | 8 | 6 | 4
| 7 | 3 | 8 | 6 | 7
So the query should return something like this:
| a | b | c | d |
|---|---|---|---|
| 1 | 2 | 3 | 4
| 2 | 5 | 7 | 1
| 7 | 3 | 8 | 6
I just want to remove all of the rows where b is duplicated.
EDIT: There seems to be some confusion about which row I want to be selected in the case of duplicate b values. I don't care because the a, c, and d should (but are not guaranteed to) be the same.
Try this
SELECT * FROM (SELECT ROW_NUMBER() OVER (PARTITION BY b ORDER BY a) NO
,* FROM TableName) AS T1 WHERE NO = 1
I think you are nearly there with DISTINCT try:
SELECT DISTINCT a, b, c, d
FROM myTable
You haven't said how to pick a row for each b value, but this will pick one for each.
Select
a,
b,
c,
d,
e
From (
Select
a,
b,
c,
d,
e,
row_number() over (partition by b order by b) rn
From
mytable
) x
Where
x.rn = 1
If you don't care what values you get for B, C, D, and E, as long as they're appropriate for that key, you can group by A:
SELECT A, MIN(B), MIN(C), MIN(D), MIN(E)
FROM MyTable
GROUP BY A
Note that MAX() would be just as valid. Some RDBMSs support a FIRST() aggregate, or similar, for exactly these circumstances where you don't care which value you get (from a certain population).
This will return what you're looking for but I think your example is flawed because you've no determinism over which value from the e column is returned.
Create Table A1 (a int, b int, c int, d int, e int)
INSERT INTO A1 (a,b,c,d,e) VALUES (1,2,3,4,5)
INSERT INTO A1 (a,b,c,d,e) VALUES (1,2,3,4,6)
INSERT INTO A1 (a,b,c,d,e) VALUES (2,5,7,1,9)
INSERT INTO A1 (a,b,c,d,e) VALUES (7,3,8,6,4)
INSERT INTO A1 (a,b,c,d,e) VALUES (7,3,8,6,7)
SELECT * FROM A1
SELECT a,b,c,d
FROM
(
SELECT ROW_NUMBER() OVER (PARTITION BY b ORDER BY a) RowNum ,*
FROM A1
) As InnerQuery WHERE RowNum = 1
You cannot put DISTINCT on a single column. You should put it right after the SELECT:
SELECT DISTINCT a, b, c, d
FROM mytable
It return the result you need for your sample table. However if you require to remove duplicates only from a single column (which is not possible) you probably misunderstood something. Give us more descriptions and sample, and we try to guide you to the right direction.