How can I delete duplicate rows in the same table that have identical CLOB data?

How can I delete duplicate rows in the same table that have identical CLOB data? - sql

I have a table in Oracle of which one of the columns (named CONTENTSTRING) is a CLOB. However, some of the rows in this table have identical data in this column. What I'd like to do is remove all the rows except for one that have this identical data. How can I accomplish this?
Googling around, I see a ton of examples for comparing two columns. I also see examples comparing between two different tables. What I don't see is an example using one table and just comparing the rows! I do think I might need to use this function: dbms_lob.compare. However, I'm still not sure how I can set this function up.
From a programmer's perspective, I would think maybe I should do something like:
SELECT CONTENTSTRING FROM TABLE_ALPHA A
and then somehow do another select from the same table as TABLE_ALPHA B, and then use dmbs_lob.compare to compare the two columns. If the row numbers are different AND the column contents are equal, then the row from TABLE_ALPHA B can be deleted.
I think that's the right approach, but how exactly would I write this out in Oracle using SQL? I would appreciate any help or resources on this. Thanks!

DELETE
FROM TABLE_ALPHA A
WHERE EXISTS (
SELECT 1 FROM TABLE_ALPHA B
WHERE DBMS_LOB.COMPARE(A.CONTENTSTRING, B.CONTENTSTRING) = 0
AND A.ROWID > B.ROWID
)
This deletes all dublicates except first one.

This answer assumes that you have a primary key field in the source table (I called it id).
You can use a subquery to list the ids of the duplicated records : this works by self-joining the table with dbms_lob.compare and a comparison clause on the id. If duplicate rows exist with the same CLOB content, all ids but the most ancient (ie the smallest) are selected. The outer query just deletes the selected ids. The NVL will consider NULL contents as duplicates (if that's not relevant for your use case, just remove them).
DELETE FROM TABLE_ALPHA
WHERE id IN (
SELECT b.id
FROM TABLE_ALPHA a
INNER JOIN TABLE_ALPHA b
ON
(
(a.contentString IS NULL AND b.contentString IS NULL)
OR dbms_lob.compare(a.CONTENTSTRING, b.CONTENTSTRING) = 0
)
AND b.id > a.id
);
See this db fiddle.

Related

Null values not being returned in Postgresql

I have two tables in Postgresql, which I need to perform the union taking the null values, to add other values in another column of the junction.
Table one:
I filtered by date, because this data is generated daily and I only need the current_date
Table two: All names.
In table two I have 9 names that are not found in table one.
When I try to perform the join, I only get the 9 names from table one as a result.
Trying with date from table one to current_date
But if I don't filter the date from table one, the null value is returned.
That is, the name that is in table two but not in table one.
What I need is to join the two tables and where there is no asset referring to the second table, fill it with 0 (zero).
In this part I understood that I must use COALESCE(vcm.ativo,0).
But first I need the names of the second table to appear as well.
The result should be like this:
If someone could help me, I'll be grateful.

As pointed out in a comment by the asker, the solution turned out to be
with todays_data as (
select vcm.cooperativa, vcm.ativo
from sga_bi.veiculos_coop_mensal as vcm
where data = current_date
)
select coop.nome, COALESCE(vcmm.ativo,0)
from sga.cooperativas as coop
left outer join todays_data as vcmm
on coop.nome = vcmm.cooperativa

How to get the differences between two - kind of - duplicated tables (sql)

Prolog:
I have two tables in two different databases, one is an updated version of the other. For example we could imagine that one year ago I duplicated table 1 in the new db (say, table 2), and from then I started working on table 2 never updating table 1.
I would like to compare the two tables, to get the differences that have grown in this period of time (the tables has preserved the structure, so that comparison has meaning)
My way of proceeding was to create a third table, in which I would like to copy both table 1 and table 2, and then count the number of repetitions of every entry.
In my opinion, this, added to a new attribute that specifies for every entry the table where he cames from would do the job.
Problem:
Copying the two tables into the third table I get the (obvious) error to have two duplicate key values in a unique or primary key costraint.
How could I bypass the error or how could do the same job better? Any idea is appreciated

Something like this should do what you want if A and B have the same structure, otherwise just select and rename the columns you want to confront....
SELECT
*
FROM
B
WHERE NOT EXISTS (SELECT * FROM A)
if NOT EXISTS doesn't work in your DBMS you could also use a left outer join comparing the rows columns values.
SELECT
A.*
from
A left outer join B
on A.col = B.col and ....

Insert with select, dependent on the values in the table inserting into EDITED

So I need to figure out how to insert into a table, from another table, with a where clause that requires me to access the table that I am inserting into. I tried an alias from the table I am inserting into, but I quickly found out that you cannot do that. Basically, what I want to check is that the values that I am inserting into the table match a particular field within the table that I am inserting into. Here is what I've tried:
INSERT INTO "USER"."TABLE1" AS A1
SELECT *
FROM "USER"."TABLE2" AS A2
WHERE A2."HIERARCHYLEVEL" = 2
AND A2."PARENT" = A1."INSTANCE"
Obviously, this was to no avail. I've tried a couple other queries, but they didn't me anywhere, either. Any help would be much appreciated.
EDIT:
I would like to add rows to this table, not add columns to the table. The two tables are of the exact same structure -- in fact, I extracted the data already in table1 from table2. What I have in table1 currently is a bunch of records who have NO PARENT, but an instance. What I want to add is all the records who have a parent in table2 that are equal to the instance in table 1.

Currently there is no way to join on a table when inserting. The solution with the subselect where you select from the table, is the correct.
Aliasing the table you want to change is only possible with UPDATE, UPSERT and MERGE. For these operations it makes sense, as you need to match a column and then decide if you need to update it or insert something instead. In your example the line from table1 that you match is not relevant, as you don't want to change it, so from the statement point of view it is not really relevant that the table you use in your subselect is the same that the one you insert into.
As alternative, I can suggest you following solution, which is equivalent with yours:
INSERT INTO "user"."table1"
SELECT
A1."ROOT",
A1."INSTANCE",
A1."PARENT",
A1."HIERARCHYLEVEL"
FROM "user"."table2" AS A1
WHERE A1."INSTANCE" in (select "PARENT" from "user"."table1")
AND A2."HIERARCHYLEVEL" = 2

This gave me the answer I was looking for, although I am sure there is an easier -- or more efficient -- way to do it.
INSERT INTO "user"."table1"
SELECT
A1."ROOT",
A1."INSTANCE",
A1."PARENT",
A1."HIERARCHYLEVEL"
FROM "user"."table2" AS A1,
"user"."table1" AS A2
WHERE A1."INSTANCE" = A2."PARENT"
AND A2."HIERARCHYLEVEL" = 2

Compare rows in different table having same columns

I have 2 tables tbl_A and tbl_A_temp. Both the table have the same schema. their primary key differ since they are identity columns. Is there a way i can compare the two rows in these two table and get to know if they differ.I will be inserting data from tbl_A_temp to tbl_A, i need this compare just to make sure that I am not inserting any duplicate data in the main tables.
Regards,
Amit

I think this should work for you. Basically, since you don't have a primary key to join on, you'll need to perform a LEFT JOIN on all your other fields. If any are different, then the NULL check will be true:
SELECT t.*
FROM tbl_A_temp t
LEFT JOIN tbl_A a ON
t.field1=a.field1 AND t.field2=a.field2 AND ...
WHERE a.field1 IS NULL
I've also seen others use CHECKSUM, but have run into issues myself with it returning false positives.

SQL Server join and wildcards

I want to get the results of a left join between two tables, with both having a column of the same name, the column on which I join. The following query is seen as valid by the import/export wizard in SQL Server, but it always gives an error. I have some more conditions, so the size wouldn't be too much. We're using SQL Server 2000 iirc and since we're using an externally developed program to interact with the database (except for some information we can't retrieve that way), we can not simply change the column name.
SELECT table1.*, table2.*
FROM table1
LEFT JOIN table2 ON table1.samename = table2.samename
At least, I think the column name is the problem, or am I doing something else wrong?

Do more columns than just your join key have the same name? If only your join key has the same name then simply select one of them since the values will be equivalent except for the non-matching rows (which will be NULL). You will have to enumerate all your other columns from one of the tables though.
SELECT table2.samename,table1.othercolumns,table2.*
FROM table1
LEFT JOIN table2 ON table1.samename = table2.samename

You may need to explicitly list the columns from one of the tables (the one with less fields), and leave out the 2nd instance of what would be the duplicate field..
select Table1.*, {skip the field Table2.sameName} Table2.fld2, Table2.Fld3, Table2.Fld4... from
Since its a common column, it APPEARS its trying to create twice in the result set, thus choking your process.

Since you should never use select *, simply replace it with the column names of the columns you want. THe join column has the same value (or null) in both sides of the join, so only select one of themm the one from table1 which will always have the value.

If you want to select all the columns from both tables just use Select * instead of including the tables separately. That will however leave you with duplicate column names in the result set, so even reading them out by name will not work and reading them by index will give inconsistent results, as changing the columns in the database will change the resultset, breaking any code depending on the ordinals of the columns.
Unfortunately the best solution is to specify exactly the columns you need and create aliases for the duplicates so they are unique.
I quickly get the column headings by setting the query to text mode and copying the top row ...

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How can I delete duplicate rows in the same table that have identical CLOB data? - sql

DELETE FROM TABLE_ALPHA A WHERE EXISTS ( SELECT 1 FROM TABLE_ALPHA B WHERE DBMS_LOB.COMPARE(A.CONTENTSTRING, B.CONTENTSTRING) = 0 AND A.ROWID > B.ROWID ) This deletes all dublicates except first one.

Related

Null values not being returned in Postgresql

How to get the differences between two - kind of - duplicated tables (sql)

Insert with select, dependent on the values in the table inserting into EDITED

Compare rows in different table having same columns

SQL Server join and wildcards

Categories

Resources