SQL: Eliminate rows that have one duplicate values in one column - sql

I have a table that may contain duplicate values in one column. For each distinct value in that column I need to select only the row with the smallest index. I have tried many combinations of distinct() min() and group by but have not been able to figure this one out. This query will be run on sql server 2008.
color | index | user_id | organization_code
blue 44 xxx yyy
blue 66 xxx yyy
red 12 aaa bbb
white 55 ccc ddd
white 68 xxx yyy
The query would return the first, third and fourth rows.
blue 44 xxx yyy
red 12 aaa bbb
white 55 ccc ddd

Do not use keywords such as index as column names. Use windowing functions for your problem, see example below
select color, [index], [USER_ID], organization_code from (
select *, ROW_NUMBER() over (partition by color order by [index]) as ranker from table
) Z where ranker = 1

Related

How to separate column values by condition (pivot) to fill one row

I have two tables that I'd like do a full outer join where the resulting view separates the values table into two separate columns with one row for each name_id. I have made one approach with a CASE expression to select by type and then use it with pandas to fill in the values and return distinct name_ids.
Name Table
name_id
name
1
foo
2
bar
3
doo
4
sue
Values Table
name_id
value
type
1
90
red
2
95
blue
3
33
red
3
35
blue
4
60
blue
4
20
red
This is a condensed version. In my full table, I need to do this twice with two separate value tables sorted by type, red/blue and control/placebo.
Simple Join
SELECT names_table.name_id, name, value, type
FULL OUTER JOIN values_table
ON names_table.name_id = values_table.name_id
WHERE type in ('red', 'blue')
name_id
name
value
type
1
foo
90
red
2
bar
95
blue
3
doo
33
red
3
doo
35
blue
4
sue
60
blue
4
sue
20
red
Current work around result which I then fix with python and pandas
SELECT names_table.name_id, name, value, type
CASE
WHEN type = 'red' THEN value END red,
CASE
WHEN type = 'blue' THEN value END blue
FROM names_table
FULL OUTER JOIN values_table
ON names_table.name_id = values_table.name_id
name_id
name
blue
red
1
foo
Null
90
2
bar
95
Null
3
doo
35
Null
3
doo
Null
33
4
sue
60
Null
4
sue
Null
20
This is my desired output below, where I would have the types as columns and just rows for unique name_ids but with value tables 1 and 2.
Desired Output
name_id
name
blue
red
1
foo
Null
90
2
bar
95
Null
3
doo
35
33
4
sue
60
20
I have two tables that I'd like do a full outer join ...
Why would you? Better explain what you actually want to do instead of the assumed tool to implement it.
Simple pivoting with the aggregate FILTER clause. See:
Aggregate columns with additional (distinct) filters
SELECT name_id, n.name, v.blue, v.red
FROM (
SELECT name_id
, min(value) FILTER (WHERE type = 'blue') AS blue
, min(value) FILTER (WHERE type = 'red') AS red
FROM values_table
GROUP BY 1
) v
LEFT JOIN names_table n USING (name_id);
Produces your desired result.
db<>fiddle here
The LEFT JOIN includes result rows even if no name is found.
A FULL [OUTER] JOIN would add names in the result that have no values at all. I think you really want a LEFT [OUTER] JOIN or even a plain [INNER] JOIN.
You can just switch the JOIN type to adapt to your actual requirements. The identical column name "name_id" allows to join with a USING clause. The unqualified name_id in the outer SELECT works for any join type.
Note how I aggregate first and join later. Typically substantially faster. See:
Query with LEFT JOIN not returning rows for count of 0
If there can be duplicate values for "red" or "blue", you'll have to define how to deal with those.
For more involved queries consider crosstab(). See:
PostgreSQL Crosstab Query

How to transform data rows into new column?

table
id text
1 aaa
121 bbb
4 ccc
1 ddd
new table
id text2
1 aaaddd
121 bbb
4 ccc
I do not think I can use PIVOT since I never know how many and what id and text values would be so I cannot hardcode them in a PIVOT instruction.
use group by with string_agg
select id,string_agg(text,'') as text2
from table
group by id

SQL Auto Number rows with grouping

I am not sure if this can even be done, but what I am needing to do is create row numbers for specific columns and not just a simple row_number() as I need a specific pattern or sequence and am not sure how to handle this. Below is the desired result I am attempting in SQL.
COL_1 COL_2 DESIRED RESULT
AAA AAA 0
AAA BBB 1
AAA BBB 1
AAA CCC 2
AAA DDD 3
ABB ABB 0
ABB BBB 1
ABB CCC 2
ABB CCC 2
ABB DDD 3
I interpret this as wanting to enumerate the values of col_2 within col_1.
If so, you can use dense_rank():
select t.*,
dense_rank() over (partition by col_1 order by col_2) - 1 as ranking
from t;
This assumes that the ordering you want is based on the col_2 column within each col_1.

SQL SERVER - Change column value if value exists

I have the following problem.
Imagine I get the following return table from a select statement
Column A Column B
100 aaa
100 bbb
100 ccc
200 ddd
300 eee
So the question is, how can I change my SQL Select statement to add a new column that shows the numbers of times the Column A has a repeat value. The problem is that I need to get some subgrups with an order.
For example, it should return something like:
Column A Column B Column C
100 aaa 1
100 bbb 2
100 ccc 3
200 ddd 1
300 eee 1
Thank you very much for your support!
This is the classic usecase for the analytic RANK() function:
SELECT a, b, RANK() OVER (PARTITION BY a ORDER BY b) AS c
FROM my_table
Add ROW_NUMBER() OVER (PARTITION BY ColA ORDER BY SomethingElse) as ColC. That gives you a sequential row number per "group" in ColA.

Return only first item from a related group

I have a block of data like this:
RW | PK A B C D
============================
1 | 1 aa 123 x 99
2 | 2 aa 234 v 98
3 | 3 bb 321 z 11
4 | 4 bb 210 w 91
5 | 5 cc 456 y 55
How can I grab just the first item of each set (ID'd by column A), like so?
RW | A B C D
=======================
1 | aa 123 x 99
2 | bb 321 z 11
3 | cc 456 y 55
I can GROUP BY or use DISTINCT but that's very inefficient with what I'm looking at, while running a straight list takes less than 100msecs. The two aforementioned options also may produce more than once instance of an item in column A, since the related values may differ.
In other words,
SELECT MYTABLE.A, MYTABLE.D, MYTABLE.D, MYTABLE.D
FROM MYTABLE
is very fast (less than a second), while
SELECT MYTABLE.A, MYTABLE.D, MYTABLE.D, MYTABLE.D
FROM MYTABLE
GROUP BY MYTABLE.A, MYTABLE.D, MYTABLE.D, MYTABLE.D
and
SELECT DISTINCT MYTABLE.A, MYTABLE.D, MYTABLE.D, MYTABLE.D
FROM MYTABLE
takes a much longer amount of time (minutes, but I have not let it complete).
I need no aggregate functions (COUNT, SUM, etc.), just a listing, once per item. The number of occurrences per value in column A vary, so I can't just grab every x row.
Why don't I just run the list and use Excel or something like that to sort? I'm looking at a few million records to be returned, and I am not able to process so many records using any software that I am familiar with.
It sounds like you want something like
SELECT pk,
a,
b,
c,
d
FROM( SELECT pk,
a,
b,
c,
d,
row_number() over (partition by a order by pk asc) rnk
FROM your_table )
WHERE rnk = 1
Try this too..
select * from table where rowid in (select min(rowid) from table group by a);