Oracle SQL - Delete Entries Based Off Unique Rows - sql

I am pulling a single column from a DB and it looks something like this:
Group
A
A
A
B
B
B
C
D
D
D
E
F
F
F
I need to delete unique entries, so entries A, B, D and F should stay and entries C and E should be deleted.
I am getting this row based of a query like this:
select Group from table where type = 'rec';
and basically each type should have more than one group and if it doesn't it needs to be removed.
NOTE: I need it to be automated and not just a "remove C" and "remove E" because there are thousands of rows and I'm not sure which I will need to delete unless I just find them. The number of rows that will need to be deleted will also be changing, hence why I need it to be automated based off of count.

One method is:
delete t
where "group" in (select "group" from t group by "group" having count(*) = 1);
Based on your sample code:
delete t
where type = 'rec' and
"group" in (select "group" from t where type = 'rec' group by "group" having count(*) = 1);
You could also do this as:
delete t
where type = 'rec' and
not exists (select 1
from t t2
where t2.group = t.group and t2.type = 'rec' and t2.rowid <> t.rowid
);

Judging by your comments all you need is running total. If entry occurred once then select/delete it. The analytic functions is the best and easiest way if you ask me:
SELECT * FROM
(
SELECT COUNT(grp) OVER (PARTITION BY grp ORDER BY grp) cnt -- number of occurances --
, grp
FROM
( -- convert to multi-row - REPLACE AAABBB with your actual column --
SELECT trim(regexp_substr('A A A B B B C D D D E F F F', '[^ ]+', 1, LEVEL)) grp
FROM dual -- from your table_name --
CONNECT BY LEVEL <= regexp_count('A A A B B B C D D D E F F F', '[^ ]+')
)
)
WHERE cnt = 1 -- Select/Delete only those that appeared once --
/
Output:
cnt|grp
--------
1 C
1 E
Full output, if you comment where:
cnt|grp
--------
3 A
3 A
3 A
3 B
3 B
3 B
1 C
3 D
3 D
3 D
1 E
3 F
3 F
3 F
Final edit based on your questions. This simulates your table:
WITH your_table AS
(
SELECT 'rec' grp_type FROM dual
UNION ALL
SELECT 'not_rec' grp_type FROM dual
)
SELECT grp_type FROM your_table WHERE grp_type = 'rec' -- apply all that above to this select --
/

Related

SQL get the closest two rows within duplicate rows

I have following table
ID Name Stage
1 A 1
1 B 2
1 C 3
1 A 4
1 N 5
1 B 6
1 J 7
1 C 8
1 D 9
1 E 10
I need output as below with parameters A and N need to select closest rows where difference between stage is smallest
ID Name Stage
1 A 4
1 N 5
I need to select rows where difference between stage is smallest
This query can make use of an index on (name, stage) efficiently:
WITH cte AS (
SELECT TOP 1
a.id AS a_id, a.name AS a_name, a.stage AS a_stage
, n.id AS n_id, n.name AS n_name, n.stage AS n_stage
FROM tbl a
CROSS APPLY (
SELECT TOP 1 *, stage - a.stage AS diff
FROM tbl
WHERE name = 'N'
AND stage >= a.stage
ORDER BY stage
UNION ALL
SELECT TOP 1 *, a.stage - stage AS diff
FROM tbl
WHERE name = 'N'
AND stage < a.stage
ORDER BY stage DESC
) n
WHERE a.name = 'A'
ORDER BY diff
)
SELECT a_id AS id, a_name AS name, a_stage AS stage FROM cte
UNION ALL
SELECT n_id, n_name, n_stage FROM cte;
SQL Server uses CROSS APPLY in place of standard-SQL LATERAL.
In case of ties (equal difference) the winner is arbitrary, unless you add more ORDER BY expressions as tiebreaker.
dbfiddle here
This solution works, if u know the minimum difference is always 1
SELECT *
FROM myTable as a
CROSS JOIN myTable as b
where a.stage-b.stage=1;
a.ID a.Name a.Stage b.ID b.Name b.Stage
1 A 4 1 N 5
Or simpler if u don't know the minimum
SELECT *
FROM myTable as a
CROSS JOIN myTable as b
where a.stage-b.stage in (SELECT min (a.stage-b.stage)
FROM myTable as a
CROSS JOIN myTable as b)

sql excluding certain results

lets say i have a data set of
A B
-- --
a 1
b 1
c 1
d 1
d 2
e 1
f 1
f 2
g 1
how would i exclude a result in column B of 1, if column B has values of both 1 and 2 for the same value in column A?
i want my results to look like this
A B
-- --
a 1
b 1
c 1
d 2
e 1
f 2
g 1
Checking explicitly here for the values 1 and 2 and using the fact that there are exactly two of them. You could potentially make this less cumbersome if it's safe to assume that you always want the highest value.
select
tbl.A,
tbl.B
from
Table1 tbl
left outer join (
select
A
from
Table1
where
B in (1,2)
group by
A
having
count(B) = 2
) mlt on tbl.A = mlt.A
where
(
mlt.A is not null
and tbl.B = 2
) or (
mlt.A is null
and tbl.B = 1
)
Figure out all the A values that have both 1 and 2.
Match those to the table on the A value.
If A is in the subquery, use the B = 2 record. If it isn't, use the B = 1 record.
select
* from tbl where a IN
(
select
a from tbl
group by a
having count(*)>1
)
and b!=1
UNION ALL
select
* from tbl where a IN
(
select
a from tbl
group by a
having count(*)=1
)
For the example data and desired result, the simplest query to achieve the result would be a GROUP BY operation and an aggregate function.
SELECT d.A
, MAX(d.B) AS B
FROM my_data_set d
GROUP BY d.A
ORDER BY d.A
If we are only interested in rows that have a 1 or 2 in column B, we can add a WHERE clause
SELECT d.A
, MAX(d.B) AS B
FROM my_data_set d
WHERE d.B IN (1,2)
GROUP BY d.A
ORDER BY d.A
With the example data, the output is the same.
Both of these statements achieve the specified result. (There is only a single row returned for each distinct value in A.)
Or, for the same the example data, we can return the same result set with a more literal implementation of the specification.
To exclude rows with 1when there is a row with 2 for the same value of A, we can use a NOT EXISTS predicate and a correlated subquery.
SELECT d.A
, d.B
FROM my_data_set d
WHERE ( d.B = 2 )
OR ( d.B = 1 AND
NOT EXISTS ( SELECT 1
FROM my_data_set e
WHERE e.A = d.A
AND e.B = 2
)
)
ORDER BY d.A, d.B

Shuffle column in Google's BigQuery based on groupby

I want to randomly shuffle the values for one single column of a table based on a groupby. E.g., I have two columns A and B. Now, I want to randomly shuffle column B based on a groupby on A.
For an example, suppose that there are three distinct values in A. Now for each distinct value of A, I want to shuffle the values in B, but just with values having the same A.
Example input:
A B C
-------------------
1 1 x
1 3 a
2 4 c
3 6 d
1 2 a
3 5 v
Example output:
A B C
------------------
1 3 x
1 2 a
2 4 c
3 6 d
1 1 a
3 5 v
In this case, for A=1 the values for B got shuffled. The same happened for A=2, but as there is only one row it stayed like it was. For A=3 by chance the values for B also stayed like they were. The values for column C stay as they are.
Maybe this can be solved by using window functions, but I am unsure how exactly.
As a side note: This should be achieved in Google's BigQuery.
Is this what you're after ? (you tagged with both Mysql and Oracle .. so I answer here using Oracle)
[edit] corrected based on confirmed logic [/edit]
with w_data as (
select 1 a, 1 b from dual union all
select 1 a, 3 b from dual union all
select 2 a, 4 b from dual union all
select 3 a, 6 b from dual union all
select 1 a, 2 b from dual union all
select 3 a, 5 b from dual
),
w_suba as (
select a, row_number() over (partition by a order by dbms_random.value) aid
from w_data
),
w_subb as (
select a, b, row_number() over (partition by a order by dbms_random.value) bid
from w_data
)
select sa.a, sb.b
from w_suba sa,
w_subb sb
where sa.aid = sb.bid
and sa.a = sb.a
/
A B
---------- ----------
1 3
1 1
1 2
2 4
3 6
3 5
6 rows selected.
SQL> /
A B
---------- ----------
1 3
1 1
1 2
2 4
3 5
3 6
6 rows selected.
SQL>
Logic breakdown:
1) w_data is just your sample data set ...
2) randomize column a (not really needed, you could just rownum this, and let b randomize ... but I do so love (over)using dbms_random :) heh )
3) randomize column b - (using partition by analytic creates "groups" .. order by random radomizes the items within each group)
4) join them ... using both the group (a) and the randomized id to find a random item within each group.
by doing the randomize this way you can ensure that you get the same # .. ie you start with one "3" .. you end with one "3" .. etc.
I feel below should work in BigQuery
SELECT
x.A as A, x.B as Old_B, x.c as C, y.B as New_B
FROM (
SELECT A, B, C,
ROW_NUMBER() OVER(PARTITION BY A ORDER BY B, C) as pos
FROM [your_table]
) as x
JOIN (
SELECT
A, B, ROW_NUMBER() OVER(PARTITION BY A ORDER BY rnd) as pos
FROM (
SELECT A, B, RAND() as rnd
FROM [your_table]
)
) as y
ON x.A = y.A AND x.pos = y.pos

Need help constructing a query to group related elements

I have a table containing the IDs of elements that are related.
ID1 ID2
A B
A C
B D
B C
E F
G D
G C
H I
D C
The example contains the following groups:
A,B,C,D,G
E,F
H,I
Since A is connected to B,C, B is connected to C,D and D is connected to G.
E,F and H,I are only related to each other.
Is it possible to find these groups using SQL? Not sure what the output of the SQL would be, maybe something like this:
ID group
A 1
B 1
C 1
D 1
G 1
E 2
F 2
H 3
I 3
Probably some form of hierarchical query will do the trick but those usually baffle me.
As long as I can discriminate between groups.
Here is what I found:
select root2 || ', ' || listagg(id1, ', ') within group (order by id1) grp
from
(
select id1, max(root2) keep (dense_rank last order by lev) root2
from
(
select t.*, connect_by_root id2 root2, level lev
from <my_table> t
connect by prior t.id1 = t.id2
)
group by id1
)
group by root2
;
This gives:
**GRP**
C, A, B, D, G
F, E
I, H

Sql view with column identifier

I am creating a select query with union of three tables....
like this
select a as A,b as B c as C where c = x union
select b as A,d as B e as C where e = y and d = a union
select f as A,g as B,h as C
and the result of query is like this:
A B C
===========
1 abc ...
55 def ...
1 sas ...
so I want to have a column that count the number of row, just to prevent the repetition of identifier.
Somthing like this
Row A B C
================
1 1 abc ...
2 55 def ...
3 1 sas ...
....
My question is how it can be done?
You can use ROW_NUMBER() like this:
SELECT ROW_NUMBER() OVER (ORDER BY A,B,C) AS RowNo, *
FROM
(
select a as A,b as B c as C where c = x
union
select b as A,d as B e as C where e = y and d = a
union
select f as A,g as B,h as C
) x
CREATE VIEW dbo.vname
AS
SELECT [Row] = ROW_NUMBER() OVER (ORDER BY A), A, B, C FROM
( <UNION query here> ) AS x;
Replace ORDER BY A with whatever ordering you'd like to see applied. Note that you will need to use ORDER BY on the outer query against dbo.viewname to guarantee that Row will come out in that order.
You can use a common table expression to achieve this:
WITH unionTable
AS
(
select a as A, b as B, c as C where c = x union
select b as A, d as B, e as C where e = y and d = a union
select f as A, g as B, h as C
)
SELECT ROW_NUMBER() OVER (ORDER BY A) AS RowNumber, *
FROM unionTable