Oracle: how to select rows that are not duplicated - sql

I have to eliminate the ones with an X in value and repeated. If they don't have an X, they stay, if they have an X but are not repeated they stay. Can you please help?
Table
Id Value
1. A.
2. X.
3. X.
3. C.
3. D.
4. X.
4. F
5. G
6. Z
7. X
8. X
8. G
Result from query should be:
1. A
2. X
5. G
6. Z
7. X

Maybe that helps:
SELECT Id,
Value
FROM(SELECT Id,
Value,
COUNT(*) OVER (PARTITION BY Id) cnt,
SUM(DECODE(Value, 'X', 1, 0)) OVER (PARTITION BY Id) sumx
FROM your_table
)
WHERE cnt = 1 OR sumx = 0;

Isn't that sufficient?
SELECT "Id", MIN("Value") "Value" FROM T
GROUP BY "Id" HAVING COUNT("Id") = 1
ORDER BY "Id"
It will discard multiple only-non-X values for a given id too. At the very least, according to your sample data, it seems to produce the desired result:
ID VALUE
1 A
2 X
5 G
6 Z
7 X
See http://sqlfiddle.com/#!4/81baa/10

Related

SQL Query get common column with diff values in other columns

I am not very fluent with SQL.. Im just facing a little issue in making the best and efficient sql query. I have a table with a composite key of column A and B as shown below
A
B
C
1
1
4
1
2
5
1
3
3
2
2
4
2
1
5
3
1
4
So what I need is to find rows where column C has both values of 4 and 5 (4 and 5 are just examples) for a particular value of column A. So 4 and 5 are present for two A values (1 and 2). For A value 3, 4 is present but 5 is not, hence we cannot take it.
My explanation is so confusing. I hope you get it.
After this, I need to find only those where B value for 4 (First Number) is less than B value for 5 (Second Number). In this case, for A=1, Row 1 (A-1, B-1,C-4) has B value lesser than Row 2 (A-1, B-2, C-5) So we take this row. For A = 2, Row 1(A-2,B-2,C-4) has B value greater than Row 2 (A-2,B-1,C-5) hence we cannot take it.
I Hope someone gets it and helps. Thanks.
Rows containing both c=4 and c=5 for a given a and ordered by b and by c the same way.
select a, b, c
from (
select tbl.*,
count(*) over(partition by a) cnt,
row_number() over (partition by a order by b) brn,
row_number() over (partition by a order by c) crn
from tbl
where c in (4, 5)
) t
where cnt = 2 and brn = crn;
EDIT
If an order if parameters matters, the position of the parameter must be set explicitly. Comparing b ordering to explicit parameter position
with params(val, pos) as (
select 4,2 union all
select 5,1
)
select a, b, c
from (
select tbl.*,
count(*) over(partition by a) cnt,
row_number() over (partition by a order by b) brn,
p.pos
from tbl
join params p on tbl.c = p.val
) t
where cnt = (select count(*) from params) and brn = pos;
I assume you want the values of a where this is true. If so, you can use aggregation:
select a
from t
where c in (4, 5)
group by a
having count(distinct c) = 2;

Use alias query as a table

I'm trying to get exclusive max values from a query.
My first query (raw data) is something like that:
Material¦Fornecedor
X B
X B
X B
X C
X C
Y B
Y D
Y D
Firstly, I need to create the max values query for table above. For that, I need to count sequentially sames values of Materials AND Fornecedors. I mean, I need to count until SQL find a line that shows different material and fornecedors.
After that, I'll get an result as showed below (max_line is the number of times that it found a line with same material and fornecedor):
max_line¦Material¦Fornecedor
3 X B
2 X C
1 Y B
2 Y D
In the end, I need to get the highest rows lines for an exclusive Material. The result of the query that I need to contruct, based on table above, should be like that:
max_line¦Material¦Fornecedor
3 X B
2 Y D
My code, so far, is showed below:
select * from
(SELECT max(w2.line) as max_line, w2.Material, w2.[fornecedor] FROM
(SELECT w.Material, ROW_NUMBER() OVER(PARTITION BY w.Material, w.[fornecedor]
ORDER BY w.[fornecedor] DESC) as line, w.[fornecedor]
FROM [Database].[dbo].['Table1'] w) as w2
group by w2.Material, w2.[fornecedor]) as w1
inner join (SELECT w1.Material, MAX(w1.max_line) AS maximo FROM w1 GROUP BY w1.material) as w3
ON w1.Material = w3.Material AND w1.row = w3.maximo
I'm stuck on inner join, since I can't alias a query and use it on inner join.
Could you, please, help me?
Thank you,
Use a window function to find the max row number then filter by it.
SELECT MAXROW, w1.Material, w1.[fornecedor]
FROM (
SELECT w2.Material, w2.[fornecedor]
, max([ROW]) over (partition by Material) MAXROW
FROM (
SELECT w.Material, w.[fornecedor]
, ROW_NUMBER() OVER (PARTITION BY w.Material, w.[fornecedor] ORDER BY w.[fornecedor] DESC) as [ROW]
FROM [Database].[dbo].['Table1'] w
) AS w2
) AS w1
WHERE w1.[ROW] = w1.MAXROW;

Counting the amount of object with the same value in another column

I want to count the amount of an occurrance the reattacht that the row back and couldn't find any good way to do it.
So 1 table would look like
id | value
1. a
2. a
3. b
4. a
5. b
6. b
7. c
8. c
9. a
which I would like to result in:
id | value | count
1. a, 4
2. a, 4
3. b, 3
4. a, 4
5. b, 3
6. b, 3
7. c, 2
8. c, 2
9. a, 4
I can only find answers with group by so any help is appreciated. This should also be matched to another table so if the result is joinable that would be helpful as well.
If your RDBMS support window functions, no need to join: you can just do a window count:
select t.*, count(*) over(partition by value) cnt from mytable t
select t.id, t.value, tmp.cnt
from your_table t
join
(
select value, count(*) as cnt
from your_table
group by value
) tmp on tmp.value = t.value

SQL: Removing Duplicates in one column while retaining the row with highest value in another column

I am using Teradata and am stuck trying to write some code... I would like to remove the rows in which columnB has a duplicate value, based on the values in ColumnA - if anyone can help me that would be great!
I have a sequencial number in columnA and would like to retain the row with the highest value in columnA.
eg. in the below table I would like to retain rows 9,7,6 & 2, because although they have a duplicate in column 2 they have the highest ColumnA value for that Letter.
Table name: DataTable
Column1 Column2 Column3 Column4 Column5
1 B X X X
2 A Y Y Y
3 E Z Z Z
4 B X X X
5 C Y Y Y
6 E Z Z Z
7 C X X X
8 B Y Y Y
9 B Z Z Z
If you just want to select the rows, you can do:
select t.*
from t
where t.columnA = (select max(t2.columnA) from t t2 where t2.columnB = t.columnB);
If you actually want to remove them, then one method is:
delete from t
where t.columnA < (select max(t2.columnA) from t t2 where t2.columnB = t.columnB);
If you want to return those rows using a SELECT there's no need for a Correlated Subquery, OLAP-functions usually perform better:
select *
from tab
qualify
row_number() over (partition by ColumnB order by columnA DESC) = 1
If you actually want to DELETE the other rows go for Gordon's query.

SQL random number that doesn't repeat within a group

Suppose I have a table:
HH SLOT RN
--------------
1 1 null
1 2 null
1 3 null
--------------
2 1 null
2 2 null
2 3 null
I want to set RN to be a random number between 1 and 10. It's ok for the number to repeat across the entire table, but it's bad to repeat the number within any given HH. E.g.,:
HH SLOT RN_GOOD RN_BAD
--------------------------
1 1 9 3
1 2 4 8
1 3 7 3 <--!!!
--------------------------
2 1 2 1
2 2 4 6
2 3 9 4
This is on Netezza if it makes any difference. This one's being a real headscratcher for me. Thanks in advance!
To get a random number between 1 and the number of rows in the hh, you can use:
select hh, slot, row_number() over (partition by hh order by random()) as rn
from t;
The larger range of values is a bit more challenging. The following calculates a table (called randoms) with numbers and a random position in the same range. It then uses slot to index into the position and pull the random number from the randoms table:
with nums as (
select 1 as n union all select 2 union all select 3 union all select 4 union all select 5 union all
select 6 union all select 7 union all select 8 union all select 9
),
randoms as (
select n, row_number() over (order by random()) as pos
from nums
)
select t.hh, t.slot, hnum.n
from (select hh, randoms.n, randoms.pos
from (select distinct hh
from t
) t cross join
randoms
) hnum join
t
on t.hh = hnum.hh and
t.slot = hnum.pos;
Here is a SQLFiddle that demonstrates this in Postgres, which I assume is close enough to Netezza to have matching syntax.
I am not an expert on SQL, but probably do something like this:
Initialize a counter CNT=1
Create a table such that you sample 1 row randomly from each group and a count of null RN, say C_NULL_RN.
With probability C_NULL_RN/(10-CNT+1) for each row, assign CNT as RN
Increment CNT and go to step 2
Well, I couldn't get a slick solution, so I did a hack:
Created a new integer field called rand_inst.
Assign a random number to each empty slot.
Update rand_inst to be the instance number of that random number within this household. E.g., if I get two 3's, then the second 3 will have rand_inst set to 2.
Update the table to assign a different random number anywhere that rand_inst>1.
Repeat assignment and update until we converge on a solution.
Here's what it looks like. Too lazy to anonymise it, so the names are a little different from my original post:
/* Iterative hack to fill 6 slots with a random number between 1 and 13.
A random number *must not* repeat within a household_id.
*/
update c3_lalfinal a
set a.rand_inst = b.rnum
from (
select household_id
,slot_nbr
,row_number() over (partition by household_id,rnd order by null) as rnum
from c3_lalfinal
) b
where a.household_id = b.household_id
and a.slot_nbr = b.slot_nbr
;
update c3_lalfinal
set rnd = CAST(0.5 + random() * (13-1+1) as INT)
where rand_inst>1
;
/* Repeat until this query returns 0: */
select count(*) from (
select household_id from c3_lalfinal group by 1 having count(distinct(rnd)) <> 6
) x
;