SQL search and destroy duplicates

SQL search and destroy duplicates - sql

I have a table with fields (simplified):
id, fld1, fld2, fld3.
id is a numeric primary key field.
There are duplicates: id differs but fld1, fld2 and fld3 are identical over 2 or more rows. There are also entries where the values occur only once, i.e. non-duplicates, of course.
Of each set of duplicate entries, I want to retain only the entry with the highest ID. I was planning to first list the doomed rows and then to delete them.
My first stab at it was this:
SELECT * FROM tab1 t1 WHERE EXISTS (
SELECT COUNT(*) FROM tab1 t2
WHERE t1.fld1 = t2.fld1 AND t1.fld2 = t2.fld2 AND t1.fld3 = t2.fld3
AND t1.id < MAX(t2.id)
HAVING COUNT(*) > 1
GROUP BY t2.fld1, t2.fld2, t2.fld3)
But (in Oracle) I'm getting a Missing right parenthesis error message. I think this needs a new approach altogether, but my SQL-fu is not up to the task. Help appreciated!
Edit:
With 'real' data fields:
select x.leg_id, x.airline_des, x.flight_nr, x.suffix, x.flight_id_date, x.lt_flight_id_date
from fdb_leg x
join ( select max(t.leg_id) 'max_id',
t.airline_des, t.flight_nr, t.suffix, t.flight_id_date, t.lt_flight_id_date
from fdb_leg t
group by t.airline_des, t.flight_nr, t.suffix, t.flight_id_date, t.lt_flight_id_date
having count(*) > 1) y on y.max_id > x.leg_id
and y.airline_des = x.airline_des and y.flight_nr = x.flight_nr and y.suffix = x.suffix
and y.flight_id_date = x.flight_id_date and x.lt_flight_id_date = y.lt_flight_id_date
Response is:
ORA-00923: FROM keyword not found where expected

Oracle 9i+, Using WITH:
To get the list of doomed entries, use:
WITH keepers AS (
SELECT MAX(t.id) 'max_id',
t.fld1, t.fld2, t.fld3
FROM TABLE_1 t
GROUP BY t.fld1, t.fld2, t.fld3
HAVING COUNT(*) > 1)
SELECT x.id,
x.fld1, x.fld2, x.fld3
FROM TABLE_1 x
JOIN keepers y ON y.max_id > x.id
AND y.fld1 = x.fld1
AND y.fld2 = x.fld2
AND y.fld3 = x.fld3
Non-WITH Equivalent:
To get the list of doomed entries, use:
SELECT x.id,
x.fld1, x.fld2, x.fld3
FROM TABLE_1 x
JOIN (SELECT MAX(t.id) 'max_id',
t.fld1, t.fld2, t.fld3
FROM TABLE_1 t
GROUP BY t.fld1, t.fld2, t.fld3
HAVING COUNT(*) > 1) y ON y.max_id > x.id
AND y.fld1 = x.fld1
AND y.fld2 = x.fld2
AND y.fld3 = x.fld3

You can delete them in one shot, like this:
SQL> create table mytable (id, fld1, fld2, fld3)
2 as
3 select 1, 1, 1, 1 from dual union all
4 select 2, 1, 1, 1 from dual union all
5 select 3, 2, 2, 2 from dual union all
6 select 4, 2, 3, 2 from dual union all
7 select 5, 2, 3, 2 from dual union all
8 select 6, 2, 3, 2 from dual
9 /
Table created.
SQL> delete mytable
2 where id not in
3 ( select max(id)
4 from mytable
5 group by fld1
6 , fld2
7 , fld3
8 )
9 /
3 rows deleted.
SQL> select * from mytable
2 /
ID FLD1 FLD2 FLD3
---------- ---------- ---------- ----------
2 1 1 1
3 2 2 2
6 2 3 2
3 rows selected.
Regards,
Rob.

Ugh, I get it. Scratch that.
This will identify the ID's needed to delete.
Select
fld1
, fld2
, fld3
, Max(ID)
From table_name
Group By
fld1
, fld2
, fld3

Related

Extract specific values from the string and join with other table

I have a requirement in which I have two tables one stores condition and the second stores lookup values.
For eg.
Table 1 (Condition Table):
Condition
"100000010073024" = "BILLED"
"100000010073027" = "Not Billed"
"100000010073026" = "Not Billed" Or "100000010073055" = "Billed"
Table2(Lookup Values):
Lookup Id
Meaning
100000010073024
Test
100000010073027
Test1
100000010073026
Test2
100000010073055
Test3
So I want results like:
Result
Test = "BILLED"
Test1 = "Not Billed"
Test2 = "Not Billed" Or Test3 = "Billed"
So how can I achieve this through oracle sql?

Here's one option.
Based on sample data you posted:
SQL> with
2 t1 (condition) as
3 (select '"100000010073024" = "BILLED"' from dual union all
4 select '"100000010073027" = "Not Billed"' from dual union all
5 select '"100000010073026" = "Not Billed" Or "100000010073055" = "Billed"' from dual
6 ),
7 t2 (lookup_id, meaning) as
8 (select 100000010073024, 'Test' from dual union all
9 select 100000010073027, 'Test1' from dual union all
10 select 100000010073026, 'Test2' from dual union all
11 select 100000010073055, 'Test3' from dual
12 ),
13 --
Split conditions to rows (separated by "or") so that you could (in the final select) work on each of them, separately:
14 -- split conditions to rows
15 splcon as
16 (select trim(regexp_substr(replace(upper(condition), 'OR', '#'), '[^#)]+', 1, column_value)) condition
17 from t1 cross join
18 table(cast(multiset(select level from dual
19 connect by level <= regexp_count(upper(condition), 'OR') + 1
20 ) as sys.odcinumberlist))
21 )
Join condition is made by instr function (whether condition contains lookup_id or not):
22 select
23 b.meaning ||
24 replace(substr(s.condition, instr(s.condition, '=') - 1), '"', '') as result
25 from splcon s join t2 b on instr(s.condition, b.lookup_id) > 0;
RESULT
------------------------------
Test = BILLED
Test1 = NOT BILLED
Test2 = NOT BILLED
Test3 = BILLED
SQL>

The simplest method may be to refactor your conditions table to have each lookup id/term pair in a separate row:
CREATE TABLE conditions (id, lookup_id, term) AS
SELECT 1, 100000010073024, 'BILLED' FROM DUAL UNION ALL
SELECT 2, 100000010073027, 'Not Billed' FROM DUAL UNION ALL
SELECT 3, 100000010073026, 'Not Billed' FROM DUAL UNION ALL
SELECT 3, 100000010073055, 'Billed' FROM DUAL;
Then you could also add referential constraints to the lookups table.
ALTER TABLE conditions ADD CONSTRAINT conditions__lookup_id__fk
FOREIGN KEY (lookup_id) REFERENCES lookups(lookup_id);
The query is simply:
SELECT LISTAGG('"' || l.meaning || '" = "' || c.term || '"', ' OR ')
WITHIN GROUP (ORDER BY ROWNUM) AS result
FROM conditions c
INNER JOIN lookups l
ON (c.lookup_id = l.lookup_id)
GROUP BY c.id;
Which outputs:
RESULT
"Test" = "BILLED"
"Test1" = "Not Billed"
"Test2" = "Not Billed" OR "Test3" = "Billed"
db<>fiddle here

How to union a hardcoded row after each grouped result

After every group / row i want to insert a hardcoded dummy row with a bunch of 'xxxx' to act a separator.
I would like to use oracle sql to do this query. i can execute it using a loop but i don't want to use plsql.

As the others suggest, it is best to do it on the front end.
However, if you have a burning need to be done as a query, here is how.
Here I did not use the rownum function as you have already done. I assume, your data is returned by a query, and you can replace my table with your query.
I made few more assumptions, as you have data with row numbers in it.
[I am not sure what do you mean by not PL/SQL]
Select Case When MOD(rownm, 2) = 0 then ' '
Else to_char((rownm + 1) / 2) End as rownm,
name, total, column1
From
(
select (rownm * 2 - 1) rownm,name, to_char(total) total ,column1 from t
union
SELECT (rownm * 2) rownm,'XXX' name, 'XXX' total, 'The row act .... ' column1 FROM t
) Q
Order by Q.rownm;
and here is the fiddle

Since you're already grouping the data, it might be easier to use GROUPING SETS instead of a UNION.
Grouping sets let you group by multiple sets of columns, including the same set twice to duplicate rows. Then the GROUP_ID function can be used to determine when the fake values should be used. This code will be a bit smaller than a UNION approach, and should be faster since it doesn't need to reference the table multiple times.
select
case when group_id() = 0 then name else '' end name,
case when group_id() = 0 then sum(some_value) else null end total,
case when group_id() = 1 then 'this rows...' else '' end column1
from
(
select 'jack' name, 22 some_value from dual union all
select 'jack' name, 1 some_value from dual union all
select 'john' name, 44 some_value from dual union all
select 'john' name, 1 some_value from dual union all
select 'harry' name, 1 some_value from dual union all
select 'harry' name, 1 some_value from dual
) raw_data
group by grouping sets (name, name)
order by raw_data.name, group_id();

You can use row generator technique (using CONNECT BY) and then use CASE..WHEN as follows:
SQL> SELECT CASE WHEN L.LVL = 1 THEN T.ROWNM END AS ROWNM,
2 CASE WHEN L.LVL = 1 THEN T.NAME
3 ELSE 'XXX' END AS NAME,
4 CASE WHEN L.LVL = 1 THEN TO_CHAR(T.TOTAL)
5 ELSE 'XXX' END AS TOTAL,
6 CASE WHEN L.LVL = 1 THEN T.COLUMN1
7 ELSE 'This row act as separator..' END AS COLUMN1
8 FROM T CROSS JOIN (
9 SELECT LEVEL AS LVL FROM DUAL CONNECT BY LEVEL <= 2
10 ) L ORDER BY T.ROWNM, L.LVL;
ROWNM NAME TOTAL COLUMN1
---------- ---------- ----- ---------------------------
1 Jack 23
XXX XXX This row act as separator..
2 John 45
XXX XXX This row act as separator..
3 harry 2
XXX XXX This row act as separator..
4 roy 45
XXX XXX This row act as separator..
5 Jacob 26
XXX XXX This row act as separator..
10 rows selected.
SQL>

Excluding records from table based on rules from another table

I'm using Oracle SQL and I have a product table with diffrent attributes and sales volume for each product and another table with certain exclusion rules for different level of aggregation. Let's look at the example:
Here is our main table with sales data on which we want to perform some calculations:
And the other table contains diffrent rules which are supposed to exclude certain rows from table above:
When there is an "x", this column shouldn't be considered so our rules are:
1. exclude all rows with ATTR_3 = 'no'
2. exlcude all rows with ATTR_1 = 'Europe' and ATTR_2 = 'snacks' and ATTR_3 = 'no'
3. exlcude all rows with ATTR_1 = 'Africa'
And based on that our final output should be like that:
How this could be achived in SQL? I was thinking about join but I have no idea how to handle different levels of aggregation for exclusions.

I think your expected output is wrong. None of the rules excludes the 2nd row (Europe - snacks - yes).
SQL> with
2 -- sample data
3 test (product_id, attr_1, attr_2, attr_3) as
4 (select 81928 , 'Europe', 'beverages', 'yes' from dual union all
5 select 16534 , 'Europe', 'snacks' , 'yes' from dual union all
6 select 56468 , 'USA' , 'snacks' , 'no' from dual union all
7 select 129921, 'Africa', 'drinks' , 'yes' from dual union all
8 select 123021, 'Africa', 'snacks' , 'yes' from dual union all
9 select 165132, 'USA' , 'drinks' , 'yes' from dual
10 ),
11 rules (attr_1, attr_2, attr_3) as
12 (select 'x' , 'x' , 'no' from dual union all
13 select 'Europe', 'snacks', 'no' from dual union all
14 select 'Africa', 'x' , 'x' from dual
15 )
16 -- query you need
17 select t.*
18 from test t
19 where (t.attr_1, t.attr_2, t.attr_3) not in
20 (select
21 decode(r.attr_1, 'x', t.attr_1, r.attr_1),
22 decode(r.attr_2, 'x', t.attr_2, r.attr_2),
23 decode(r.attr_3, 'x', t.attr_3, r.attr_3)
24 from rules r
25 );
PRODUCT_ID ATTR_1 ATTR_2 ATT
---------- ------ --------- ---
81928 Europe beverages yes
16534 Europe snacks yes
165132 USA drinks yes
SQL>

You can use the join using CASE .. WHEN statement as follows:
SELECT P.*
FROM PRODUCT P
JOIN RULESS R ON
(R.ATTR_1 ='X' OR P.ATTR_1 <> R.ATTR_1)
AND (R.ATTR_2 ='X' OR P.ATTR_2 <> R.ATTR_2)
AND (R.ATTR_3 ='X' OR P.ATTR_3 <> R.ATTR_3)

You can use NOT EXISTS
SELECT *
FROM sales s
WHERE NOT EXISTS (
SELECT 0
FROM attributes a
WHERE ( ( a.attr_1 = s.attr_1 AND a.attr_1 IS NOT NULL )
OR a.attr_1 IS NULL )
AND ( ( a.attr_2 = s.attr_2 AND a.attr_2 IS NOT NULL )
OR a.attr_2 IS NULL )
AND ( ( a.attr_3 = s.attr_3 AND a.attr_3 IS NOT NULL )
OR a.attr_3 IS NULL )
)
where I considered the x values within the attributes table as NULL. If you really have x characters, then you can use :
SELECT *
FROM sales s
WHERE NOT EXISTS (
SELECT 0
FROM attributes a
WHERE ( ( NVL(a.attr_1,'x') = s.attr_1 AND NVL(a.attr_1,'x')!='x' )
OR NVL(a.attr_1,'x')='x' )
AND ( ( NVL(a.attr_2,'x') = s.attr_2 AND NVL(a.attr_2,'x')!='x' )
OR NVL(a.attr_2,'x')='x' )
AND ( ( NVL(a.attr_3,'x') = s.attr_3 AND NVL(a.attr_3,'x')!='x' )
OR NVL(a.attr_3,'x')='x' )
)
instead.
Demo

I would do this with three different not exists:
select p.*
from product p
where not exists (select 1
from rules r
where r.attr_1 = p.attr_1 and r.attr_1 <> 'x'
) and
not exists (select 1
from rules r
where r.attr_2 = p.attr_2 and r.attr_2 <> 'x'
) and
not exists (select 1
from rules r
where r.attr_3 = p.attr_3 and r.attr_3 <> 'x'
) ;
In particular, this can take advantage of indexes on (attr_1), (attri_2) and (attr_3) -- something that is quite handy if you have a moderate number of rules.

Joining a list of values with table rows in SQL

Suppose I have a list of values, such as 1, 2, 3, 4, 5 and a table where some of those values exist in some column. Here is an example:
id name
1 Alice
3 Cindy
5 Elmore
6 Felix
I want to create a SELECT statement that will include all of the values from my list as well as the information from those rows that match the values, i.e., perform a LEFT OUTER JOIN between my list and the table, so the result would be like follows:
id name
1 Alice
2 (null)
3 Cindy
4 (null)
5 Elmore
How do I do that without creating a temp table or using multiple UNION operators?

If in Microsoft SQL Server 2008 or later, then you can use Table Value Constructor
Select v.valueId, m.name
From (values (1), (2), (3), (4), (5)) v(valueId)
left Join otherTable m
on m.id = v.valueId
Postgres also has this construction VALUES Lists:
SELECT * FROM (VALUES (1, 'one'), (2, 'two'), (3, 'three')) AS t (num,letter)
Also note the possible Common Table Expression syntax which can be handy to make joins:
WITH my_values(num, str) AS (
VALUES (1, 'one'), (2, 'two'), (3, 'three')
)
SELECT num, txt FROM my_values
With Oracle it's possible, though heavier From ASK TOM:
with id_list as (
select 10 id from dual union all
select 20 id from dual union all
select 25 id from dual union all
select 70 id from dual union all
select 90 id from dual
)
select * from id_list;

the following solution for oracle is adopted from this source. the basic idea is to exploit oracle's hierarchical queries. you have to specify a maximum length of the list (100 in the sample query below).
select d.lstid
, t.name
from (
select substr(
csv
, instr(csv,',',1,lev) + 1
, instr(csv,',',1,lev+1 )-instr(csv,',',1,lev)-1
) lstid
from (select ','||'1,2,3,4,5'||',' csv from dual)
, (select level lev from dual connect by level <= 100)
where lev <= length(csv)-length(replace(csv,','))-1
) d
left join test t on ( d.lstid = t.id )
;
check out this sql fiddle to see it work.

Bit late on this, but for Oracle you could do something like this to get a table of values:
SELECT rownum + 5 /*start*/ - 1 as myval
FROM dual
CONNECT BY LEVEL <= 100 /*end*/ - 5 /*start*/ + 1
... And then join that to your table:
SELECT *
FROM
(SELECT rownum + 1 /*start*/ - 1 myval
FROM dual
CONNECT BY LEVEL <= 5 /*end*/ - 1 /*start*/ + 1) mypseudotable
left outer join myothertable
on mypseudotable.myval = myothertable.correspondingval

Assuming myTable is the name of your table, following code should work.
;with x as
(
select top (select max(id) from [myTable]) number from [master]..spt_values
),
y as
(select row_number() over (order by x.number) as id
from x)
select y.id, t.name
from y left join myTable as t
on y.id = t.id;
Caution: This is SQL Server implementation.
fiddle

For getting sequential numbers as required for part of output (This method eliminates values to type for n numbers):
declare #site as int
set #site = 1
while #site<=200
begin
insert into ##table
values (#site)
set #site=#site+1
end
Final output[post above step]:
select * from ##table
select v.id,m.name from ##table as v
left outer join [source_table] m
on m.id=v.id

Suppose your table that has values 1,2,3,4,5 is named list_of_values, and suppose the table that contain some values but has the name column as some_values, you can do:
SELECT B.id,A.name
FROM [list_of_values] AS B
LEFT JOIN [some_values] AS A
ON B.ID = A.ID

SELECT DISTINCT for data groups

I have following table:
ID Data
1 A
2 A
2 B
3 A
3 B
4 C
5 D
6 A
6 B
etc. In other words, I have groups of data per ID. You will notice that the data group (A, B) occurs multiple times. I want a query that can identify the distinct data groups and number them, such as:
DataID Data
101 A
102 A
102 B
103 C
104 D
So DataID 102 would resemble data (A,B), DataID 103 would resemble data (C), etc. In order to be able to rewrite my original table in this form:
ID DataID
1 101
2 102
3 102
4 103
5 104
6 102
How can I do that?
PS. Code to generate the first table:
CREATE TABLE #t1 (id INT, data VARCHAR(10))
INSERT INTO #t1
SELECT 1, 'A'
UNION ALL SELECT 2, 'A'
UNION ALL SELECT 2, 'B'
UNION ALL SELECT 3, 'A'
UNION ALL SELECT 3, 'B'
UNION ALL SELECT 4, 'C'
UNION ALL SELECT 5, 'D'
UNION ALL SELECT 6, 'A'
UNION ALL SELECT 6, 'B'

In my opinion You have to create a custom aggregate that concatenates data (in case of strings CLR approach is recommended for perf reasons).
Then I would group by ID and select distinct from the grouping, adding a row_number()function or add a dense_rank() your choice. Anyway it should look like this
with groupings as (
select concat(data) groups
from Table1
group by ID
)
select groups, rownumber() over () from groupings

The following query using CASE will give you the result shown below.
From there on, getting the distinct datagroups and proceeding further should not really be a problem.
SELECT
id,
MAX(CASE data WHEN 'A' THEN data ELSE '' END) +
MAX(CASE data WHEN 'B' THEN data ELSE '' END) +
MAX(CASE data WHEN 'C' THEN data ELSE '' END) +
MAX(CASE data WHEN 'D' THEN data ELSE '' END) AS DataGroups
FROM t1
GROUP BY id
ID DataGroups
1 A
2 AB
3 AB
4 C
5 D
6 AB
However, this kind of logic will only work in case you the "Data" values are both fixed and known before hand.
In your case, you do say that is the case. However, considering that you also say that they are 1000 of them, this will be frankly, a ridiculous looking query for sure :-)
LuckyLuke's suggestion above would, frankly, be the more generic way and probably saner way to go about implementing the solution though in your case.

From your sample data (having added the missing 2,'A' tuple, the following gives the renumbered (and uniqueified) data:
with NonDups as (
select t1.id
from #t1 t1 left join #t1 t2
on t1.id > t2.id and t1.data = t2.data
group by t1.id
having COUNT(t1.data) > COUNT(t2.data)
), DataAddedBack as (
select ID,data
from #t1 where id in (select id from NonDups)
), Renumbered as (
select DENSE_RANK() OVER (ORDER BY id) as ID,Data from DataAddedBack
)
select * from Renumbered
Giving:
1 A
2 A
2 B
3 C
4 D
I think then, it's a matter of relational division to match up rows from this output with the rows in the original table.

Just to share my own dirty solution that I'm using for the moment:
SELECT DISTINCT t1.id, D.data
FROM #t1 t1
CROSS APPLY (
SELECT CAST(Data AS VARCHAR) + ','
FROM #t1 t2
WHERE t2.id = t1.id
ORDER BY Data ASC
FOR XML PATH('') )
D ( Data )
And then going analog to LuckyLuke's solution.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL search and destroy duplicates - sql

Ugh, I get it. Scratch that. This will identify the ID's needed to delete. Select fld1 , fld2 , fld3 , Max(ID) From table_name Group By fld1 , fld2 , fld3

Related

Extract specific values from the string and join with other table

How to union a hardcoded row after each grouped result

Excluding records from table based on rules from another table

Joining a list of values with table rows in SQL

SELECT DISTINCT for data groups

Categories

Resources