Exclude rows with the same values in some columns - sql

I have the table like following:
id | col1 | col2 | col3 | col4
---+------+------+--------+-----------
1 | abc | 23 | data1 | otherdata1
2 | def | 41 | data2 | otherdata2
3 | ghi | 41 | data3 | otherdata3
4 | jkl | 58 | data4 | otherdata4
5 | mno | 23 | data1 | otherdata5
6 | pqr | 41 | data3 | otherdata6
7 | stu | 76 | data2 | otherdata7
How can I fast select rows where col2+col3 doesn't have duplicates? There is over 15 millions of rows in the table, so join may be not suitable.
Final result should look like this:
id | col1 | col2 | col3 | col4
---+------+------+--------+-----------
2 | def | 41 | data2 | otherdata2
4 | jkl | 58 | data4 | otherdata4
7 | stu | 76 | data2 | otherdata7

Not sure how fast this will be, but this should work:
select id, col1, col2, col3, col4
from (
select id, col1, col2, col3, col4,
count(*) over (partition by col2, col3) as cnt
from the_table
) t
where cnt = 1
order by id;

Window functions are definitely one possibility. But, if you care about performance, it is also worth trying another approach and comparing the speed.
NOT EXISTS comes to mind:
select t.*
from table t
where not exists (select 1
from table t2
where t2.col2 = t.col2 and t2.col3 = t.col3 and
t2.id <> t.id
);
This can take advantage of an index on table(col2, col3).

Try this as well..
select * from
(
select id,col1,col2,col3,col4
,row_number() over (partition by col2,col3 order by col2,col3 desc ) as rnm
from
table
) x where rnm =1;

Related

If 2 rows have the same ID select one with the greater other column value

I'm having difficulty getting my head round this one, which should be simple.
When selecting from the table, if multiple rows have the same ID then select the row which has a greater value in Col2.
Here is my sample table:
ID | Col2 |
----------------
123 | 1 |
123 | 2 |
1234 | 2 |
12345 | 3 |
Expected output:
ID | Col2 |
----------------
123 | 2 |
1234 | 2 |
12345 | 3 |
For this example, group by is sufficient;
select id, max(col2) as col2
from t
group by id;
If you want the row with the maximum column, then I would often recommend row_number():
select t.*
from (select t.*, row_number() over (partition by id order by col2 desc) as seqnum
from t
) t
where seqnum = 1;
However, the "old-fashioned" method might have better performance:
select t.*
from t
where t.col2 = (select max(t2.col2) from t t2 where t2.id = t.id);
NOT EXISTS operator can also be used:
SELECT * FROM Table1 t1
WHERE NOT EXISTS(
SELECT 'Anything' FROM Table1 t2
WHERE t1.id = t2.id
AND t1.Col2 < t2.col2
)
Demo: http://sqlfiddle.com/#!18/5e1d6/3
| ID | Col2 |
|-------|------|
| 123 | 2 |
| 1234 | 2 |
| 12345 | 3 |

Oracle group by only ONE column

I have a table in Oracle database, which have 40 columns.
I know that if I want to do a group by query, all the columns in select must be in group by.
I simply just want to do:
select col1, col2, col3, col4, col5 from table group by col3
If I try:
select col1, col2, col3, col4, col5 from table group by col1, col2, col3, col4, col5
It does not give the required output.
I have searched this, but did not find any solution. All the queries that I found using some kind of Add() or count(*) function.
In Oracle is it not possible to simply group by one column ?
UPDATE:
My apologies, for not being clear enough.
My Table:
+--------+----------+-------------+-------+
| id | col1 | col2 | col3 |
+--------+----------+-------------+-------+
| 1 | 1 | some text 1 | 100 |
| 2 | 1 | some text 1 | 200 |
| 3 | 2 | some text 1 | 200 |
| 4 | 3 | some text 1 | 78 |
| 5 | 4 | some text 1 | 65 |
| 6 | 5 | some text 1 | 101 |
| 7 | 5 | some text 1 | 200 |
| 8 | 1 | some text 1 | 200 |
| 9 | 6 | some text 1 | 202 |
+--------+----------+-------------+-------+
and by running following query:
select col1, col2, col3 from table where col3='200' group by col1;
I will get the following desired Output:
+--------+----------+-------------+-------+
| id | col1 | col2 | col3 |
+--------+----------+-------------+-------+
| 2 | 1 | some text 1 | 200 |
| 3 | 2 | some text 1 | 200 |
| 7 | 5 | some text 1 | 200 |
+--------+----------+-------------+-------+
Long comment here;
Yeah, you can't do that. Think about it... If you have a table like so:
Col1 Col2 Col3
A A 1
B A 2
C A 3
And you're grouping by only Col2, which will group down to a single row... what happens to Col1 and Col3? Both of those have 3 distinct row values.
How is your DBMS supposed to display those?
Col1 Col2 Col3
A? A 1?
B? 2?
C? 3?
This is why you have to group by all columns, or otherwise aggregate or concatenate them. (SUM(),MAX(), MIN(), etc..)
Show us how you want the results to look and I'm sure we can help you.
Edit - Answer:
First off, thanks for updating your question. Your query doesn't have id but your expected results do, so I will answer for each separately.
Without id
You will still need to group by all columns to achieve what you're going for. Let's walk through it.
If you run your query without any group by:
select col1, col2, col3 from table where col3='200'
You will get this back:
+----------+-------------+-------+
| col1 | col2 | col3 |
+----------+-------------+-------+
| 1 | some text 1 | 200 |
| 2 | some text 1 | 200 |
| 5 | some text 1 | 200 |
| 1 | some text 1 | 200 |
+----------+-------------+-------+
So now you want to only see the col1 = 1 row once. But to do so, you need to roll all of the columns up, so your DBMS knows what do to with each of them. If you try to group by only col1, you DBMS will through an error because you didn't tell it what to do with the extra data in col2 and col3:
select col1, col2, col3 from table where col3='200' group by col1 --Errors
+----------+-------------+-------+
| col1 | col2 | col3 |
+----------+-------------+-------+
| 1 | some text 1 | 200 |
| 2 | some text 1 | 200 |
| 5 | some text 1 | 200 |
| ? | some text 1?| 200? |
+----------+-------------+-------+
If you group by all 3, your DBMS knows to group together the entire rows (which is what you want), and will only display duplicate rows once:
select col1, col2, col3 from table where col3='200' group by col1, col2, col3
+----------+-------------+-------+
| col1 | col2 | col3 |
+----------+-------------+-------+
| 1 | some text 1 | 200 |
| 2 | some text 1 | 200 | --Desired results
| 5 | some text 1 | 200 |
+----------+-------------+-------+
With id
If you want to see id, you will have to tell your DBMS which id to display. Even if we group by all columns, you won't get your desired results, because the id column will make each row distinct (They will no longer group together):
select id, col1, col2, col3 from table where col3='200' group by id, col1, col2, col3
+--------+----------+-------------+-------+
| id | col1 | col2 | col3 |
+--------+----------+-------------+-------+
| 2 | 1 | some text 1 | 200 | --id = 2
| 3 | 2 | some text 1 | 200 |
| 7 | 5 | some text 1 | 200 |
| 8 | 1 | some text 1 | 200 | --id = 8
+--------+----------+-------------+-------+
So in order to group these rows, we need to explicitly say what to do with the ids. Based on your desired results, you want to choose id = 2, which is the minimum id, so let's use MIN():
select MIN(id), col1, col2, col3 from table where col3='200' group by col1, col2, col3
--Note, MIN() is an aggregate function, so id need not be in the group by
Which returns your desired results (with id):
+--------+----------+-------------+-------+
| id | col1 | col2 | col3 |
+--------+----------+-------------+-------+
| 2 | 1 | some text 1 | 200 |
| 3 | 2 | some text 1 | 200 |
| 7 | 5 | some text 1 | 200 |
+--------+----------+-------------+-------+
Final thought
Here were your two trouble rows:
+--------+----------+-------------+-------+
| id | col1 | col2 | col3 |
+--------+----------+-------------+-------+
| 2 | 1 | some text 1 | 200 |
| 8 | 1 | some text 1 | 200 |
+--------+----------+-------------+-------+
Any time you hit these, just think about what you want each column to do, one at a time. You will need to handle all columns any time you do grouping or aggregates.
id, you only want to see id = 2, which is the MIN()
co1, you only want to see distinct values, so GROUP BY
col2, you only want to see distinct values, so GROUP BY
col3, you only want to see distinct values, so GROUP BY
maybe analytic functions is what you need
try smth like this:
select col1, col2, col3, col4, col5
, sum(*) over (partition by col1) as col1_summary
, count(*) over () as total_count
from t1
if you google the article - you find thousands on examples
for example this
Introduction to Analytic Functions (Part 1)
Why do you want to GROUP BY , wouldn't you want to ORDER BY instead?
If you state an English language version of the problem you are trying to solve (i.e. the requirements) it would be easier to be more specific.
I guess,maybe you need upivot function
or post your specific final result you want
select col3, col_group
from table
UNPIVOT ( col_group for value in ( col1,col2,col4,col5))
SELECT * FROM table
WHERE id IN (SELECT MIN(id) FROM table WHERE col3='200' GROUP BY col1)

Issue in joining 2 datasets

I have two datasets like below:
1:
+---------------------------+
| Id | Col1 | Col2 | Col3 |
+---------------------------+
| 1 | abc | 0 | 01/01/2010 |
| 2 | def | 10 | 10/10/2011 |
+---------------------------+
2:
+-------------------------------------------+
| Id | Col4 | Col5 | Col6 |
+-------------------------------------------+
| 1 | abc | 0 | 01/01/2010 |
| 5 | xyz | 12 | 5/6/2013 |
+-------------------------------------------+
Now I want to combine both these into a single dataset which shows something like this:
+----------------------------------------------------------------------+
| ID | Col1 | Col2 | Col3 | Col4 | Col5 | Col6 |
+----------------------------------------------------------------------+
| 1 | abc | 0 | 01/01/2010 | abc | 0 | 01/01/2010 |
| 2 | def | 10 | 10/10/2011 | null | null | null |
| 5 | null | null | null | xyz | 12 | 5/6/2013 |
+----------------------------------------------------------------------+
The issue is not all ids in dataset 1 are in dataset 2 and vice versa. What i need as all data from datasets1 and 2 and only the common from 1 and 2 with 2 transposed with 1 as shown above. I have used pipe as a separator.
An inputs are highly appreciated. i tried everything like full outer join, inner join , CTE etc - nothing is working.
CREATE TABLE #TEMP1 (ID INT, Col1 VARCHAR(100), Col2 INT, Col3 DATETIME)
CREATE TABLE #TEMP2 (ID INT, Col4 VARCHAR(100), Col5 INT, Col6 DATETIME)
INSERT INTO #TEMP1 VALUES (1,'abc',0,'1/1/2010')
INSERT INTO #TEMP1 VALUES (1,'def',0,'1/1/2010')
INSERT INTO #TEMP2 VALUES (1,'abc',0,'1/1/2010')
INSERT INTO #TEMP2 VALUES (1,'def',0,'1/1/2010')
SELECT DISTINCT A.ID,A.Col1,A.Col2,A.Col3,B.Col4,B.Col5,B.Col6
FROM #TEMP1 A
FULL OUTER JOIN #TEMP2 B ON A.ID = B.ID
Thanks.
Try using below SQL :
select t1.Id , Col1 , Col2 , Col3 , Col4 , Col5 , Col6
from temp1 t1 left join temp2 t2
on t1.Id=t2.Id
union
select t2.Id , Col1 , Col2 , Col3 , Col4 , Col5 , Col6
from temp1 t1 right join temp2 t2
on t1.Id=t2.Id
Also, i tried on fiddle for you :
http://sqlfiddle.com/#!2/d60a1e/5

display records based on ranks and also delete duplicated data

i have a table like this
+------+------+------+------+
| col1 | col2 | col3 | rank |
+------+------+------+------+
| 1 | A | X | 4 |
| 2 | C | Y | 3 |
| 2 | C | Y | 3 |
| | A | X | 3 |
| 1 | B | Z | 2 |
+------+------+------+------+
(5 rows)
I need o/p like this
+------+------+------+------+
| col1 | col2 | col3 | rank |
+------+------+------+------+
| 1 | A | X | 4 |
| 2 | C | Y | 3 |
| 1 | B | Z | 2 |
+------+------+------+------+
so that I written query like below
select col1,col2,col3,rank,dense_rank() over(order by rank desc) from table1;
but its not giving proper o/p
try this !!
select a.col1,a.col2,a.col3,max(a.rank) as rank
from [dbo].[5] a join [dbo].[5] b
on a.col1=b.col1 group by a.col1,a.col2,a.col3
looks like you need aggregation with max():
select
col1,col2,col3,
max(rnk)
from table1
group by col1,col2,col3
If you could have different values of col1 for one combination of col2, col3, then distinct on is what you need:
select distinct on (col2, col3)
col1,col2,col3,
rnk
from table1
order by col2, col3, rnk desc
sql fiddle demo
The following should match what you are looking for:
select col1,col2,col3,rank,dense_rank() over(order by rank desc) from table1
WHERE col1 IS NOT NULL
GROUP BY 1, 2, 3, 4;
You can also use numeric aliases in your order by clause if you want one.

TSQL select the from two rows that has higher priority and is not null

I try to consolidate two rows of the same table whereas each row has a priority.
The value of interest is the value having priority 1 if it is not NULL; otherwise the value with priority 0.
An example data source could be:
| Id | GroupId | Priority | Col1 | Col2 | Col3 | ... | Coln |
-----------------------------------------------------------------
| 1 | 1 | 0 | NULL | 4711 | 3.41 | ... | f00 |
| 2 | 1 | 1 | NULL | NULL | 2.83 | ... | bar |
| 3 | 2 | 0 | NULL | 4711 | 3.41 | ... | f00 |
| 4 | 2 | 1 | 23 | NULL | 2.83 | ... | NULL |
and I want to have:
| GroupId | Col1 | Col2 | Col3 | ... | Coln |
-------------------------------------------------
| 1 | NULL | 4711 | 2.83 | ... | bar |
| 2 | 23 | 4711 | 2.83 | ... | f00 |
Is there a generic way in TSQL without the need to check each column explicitly?
SELECT
t1.GroupId,
ISNULL(t2.Col1, t1.Col1) as Col1,
ISNULL(t2.Col2, t1.Col2) as Col2,
ISNULL(t2.Col3, t1.Col3) as Col3,
...
ISNULL(t2.Coln, t1.Coln) as Coln
FROM mytable t1
JOIN mytable t2 ON t1.GroupId = t2.GroupId
WHERE
t1.Priority = 0 AND
t2.Priority = 1
Regards
I'll elaborate the ROW_NUMBER() solution that #KM suggested since IMO it's the best solution for this. (In CTE form for easier readability)
WITH cte AS (
SELECT
t1.GroupId,
t1.Col1,
t1.Col2,
ROW_NUMBER() OVER(PARTITION BY t1.GroupId ORDER BY ISNULL(GroupId ,-1) ) AS [row_id]
FROM
mytable t1
)
SELECT
*
FROM
cte
WHERE
row_id = 1
That will give you the row with the highest priority (according to your rules) for each GroupId in mytable.
ROW_NUMBER and RANK are two of my favorite TSQL tricks. http://msdn.microsoft.com/en-us/library/ms186734.aspx
edit: Another favorite of mine is PIVOT/UNPIVOT which you can use to transpose rows/columns which is another way of going about this type of problem. http://msdn.microsoft.com/en-us/library/ms177410.aspx
I think this would do what you are asking for without using isnull for every column
select
*
from
mytable t1
where
priority=(select max(priority) from mytable where groupid=t1.groupid group by groupid)