drop duplicates on some columns and keep other columns values - sql

I have the following table with Postgres:
Id Col1 Col2 Col3
1 A 1 x
2 A 0 y
3 A 0 z
4 B 0 x
5 B 1 y
6 C 0 z
As part of a select query, I want to be able to drop duplicates in Col1, based on the highest Col2 values (where will never be multiple highest values per Col1 value), and keep the corresponding Col2, Col3 values.
Desired output:
Id Col1 Col2 Col3
1 A 1 x
5 B 1 y
6 C 0 z

In Postgres, you can use distinct on:
select distinct on (col1) t.*
from t
order by col1, col2 desc;

Related

Sum of partition

Suppose my table has below 3 columns:
Col1 Col2 Col3
2 A A
3 B A
4 C A
4 D B
5 E B
6 F B
Output
Col1 Col2 Col3
9 A A
9 B A
9 C A
15 D B
15 E B
15 F B
I want grouping based on column 3 and want to sum first 3 rows and they should show as 9,9,9 in output. I want to fetch all records from column 2 also.
Select Sum(Col1), Col2, Col3 from Table
Group by Col2, Col3
I have used group by and over (). Not able to get this output.
Kindly let me know how this can be achieve.
We can try using SUM() as an analytic function here:
SELECT SUM(Col1) OVER (PARTITION BY Col3) AS Col1, Col2, Col3
FROM yourTable
ORDER BY Col2;

How can I find groups with more than one rows and list the rows in each such group?

I have a table "mytable" in a database.
Given a subset of the columns of the table, I would like to group by the subset of the columns, and find those groups with more than one rows:
For example, if the table is
col1 col2 col3
1 1 1
1 1 2
1 2 1
2 2 1
2 2 3
2 1 1
I am interested in finding groups by col1 and col2 with more than one rows, which are:
col1 col2 col3
1 1 1
1 1 2
and
col1 col2 col3
2 2 1
2 2 3
I was wondering how to write a SQL query for that purpose?
Is the following the best way to do that?
First get the col1 and col2 values of such groups:
SELECT col1 col2 COUNT(*)
FROM mytable
GROUP BY col1, col2
HAVING COUNT(*) > 1
Then based on the output of the previous query, manually write a query for each group:
SELECT *
FROM mytable
WHERE col1 = val1 AND col2 = val2
If there are many such groups, then I will have to manually write many queries, which can be a disadvantage.
I am using SQL Server.
Thanks.
This is a common problem. One solution is to get the "keys" in a derived table and join to that to get the rows.
declare #test as table (col1 int, col2 int, col3 int)
insert into #test values (1,1,1),(1,1,2),(1,2,1),(2,2,1),(2,2,3),(2,1,1)
select t.*
from #test t
inner join (
select col1, col2
from #test
group by col1, col2
having count(*) > 1
) k
on k.col1 = t.col1 and k.col2 = t.col2
col1 col2 col3
----------- ----------- -----------
1 1 1
1 1 2
2 2 1
2 2 3
The window function sum() over() may help here
Example
with cte as (
Select *
,Cnt = sum(1) over (partition by Col1,Col2)
From YourTable
)
Select *
From cte
Where Cnt>=2
Results
Another option (less performant)
Select top 1 with ties *
From YourTable
Order By case when sum(1) over (partition by Col1,Col2) > 1 then 1 else 2 end
Results

Count records in query in groups based on column value

Let's suppose a have a very simple query in SQL
SELECT Col1,Col2 From Table1
and it gives me result:
Col1 Col2
A 5
A 7
A 2
B 1
B 1
B 4
B 0
C 4
C 1
C 2
I want to count rows in groups made by Col1 and in order made by Col2. If values in Col2 for some rows in group are equal then they should have different numbers, as shown in example
So I want to have
Col1 Col2 Nr
A 5 2
A 7 3
A 2 1
B 0 1
B 1 2
B 1 3
B 4 4
C 4 3
C 1 1
C 2 2
Any ideas how to make it?
If your database supports window functions, use ROW_NUMBER
select col1,col2,row_number() over(partition by col1 order by col2) as nr
from tablename
If your database doesn't support window functions, use
select col1,col2,
(select count(*)+1 from tablename t1 where t1.col1=t.col1 and t1.col2<t.col2) as nr
from tablename t
You can use the row_number window function:
SELECT col1,
col2,
ROW_NUMBER() OVER (PARTITION BY col1 ORDER BY col2 ASC) AS Nr
FROM table1
ORDER BY 1, 2, 3

Removing rows in SQL that have a duplicate column value

I have looked high and low on SO for an answer over the last couple of hours (subqueries, CTE's, left-joins with derived tables) to this question but none of the solutions are really meeting my criteria..
I have a table with data like this :
COL1 COL2 COL3
1 A 0
2 A 1
3 A 1
4 B 0
5 B 0
6 B 0
7 B 0
8 B 1
Where column1 1 is the primary key and is an int. Column 2 is nvarchar(max) and column 3 is an int. I have determined that by using this query:
select name, COUNT(name) as 'count'
FROM [dbo].[AppConfig]
group by Name
having COUNT(name) > 3
I can return the total counts of "A, B and C" only if they have an occurrence of column C more than 3 times. I am now trying to remove all the rows that occur after the initial value of column 3. The sample table I provided would look like this now:
COL1 COL2 COL3
1 A 0
2 A 1
4 B 0
8 B 1
Could anyone assist me with this?
If all you want is the first row with a ColB-ColC combination, the following will do it:
select min(id) as id, colB, colC
from tbl
group by colB, colC
order by id
SQL Fiddle
This should work:
;WITH numbered_rows as (
SELECT
Col1,
Col2,
Col3,
ROW_NUMBER() OVER(PARTITION BY Col2, Col3 ORDER BY Col3) as row
FROM AppConfig)
SELECT
Col1,
Col2,
Col3
FROM numbered_rows
WHERE row = 1
SELECT DISTINCT MIN(COL1) AS COL1,COL2,COL3
FROM TABLE
GROUP BY COL2,COL3
ORDER BY COL1

oracle query for no data

I have table A with col1 ,col2 with data as
col1 col2
-----------
1 x
2 x
3 x
1 y
2 y
3 y
4 y
1 z
2 z
I want output as:
col1 col2
-----------
1 x
2 x
3 x
4 x
1 y
2 y
3 y
4 y
1 z
2 z
3 z
4 z
Even if values are not there in col2 for max value in col1 i.e '4' query should display up to 4.
SELECT A.col1, B.col2
FROM (SELECT DISTINCT col1 FROM YourTable) A
CROSS JOIN (SELECT DISTINCT col2 FROM YourTable) B
If you want the cartesian product of each possible combination of values in col1 and col2:
Select col1, col2 from
(select distinct col1 from sourcetable) as t1
Cross join
(select distinct col2 from sourcetable) as t2